M arch h of the Frob blins s: - AMD Developer Central
-
Upload
khangminh22 -
Category
Documents
-
view
1 -
download
0
Transcript of M arch h of the Frob blins s: - AMD Developer Central
AN
5
4
5
6
7
AdvancesinReNTatarchuk(E
52|P a g e
SimCr
jeremyshopf joshbarczak chrisoatam natalyatatarch
ealͲTimeRendeEditor)
Mmularowd
Figure
amdcom amdcom mdcom
hukamdcom
eringin3DGra
archation s of
Cre
1 Froblins
m
JeremyShopf4
Gam
aphicsandGam
h of and Inteleatur
navigate to
JoshuaBarczak5
meComputA
mesCoursendashS
the Rendlligenres o
a mushroom
ChristopOat
tingApplicAMDInc
IGGRAPH2008
Frobderinnt anon GP
m patch goal
pher6
NTat
cationsGro
8
blinsng Mand DePU
l and harves
Natalyatarchuk7
oup
Cha
s assiv
etaile
t food
apter 3
ve ed
Chapter 3 March of the Froblins
53|P a g e
31 BeyondPrettyPicturestowardIntelligentInteractiveExperiences Artificial intelligence(AI) isgenerallyconsideredtobeoneofthekeycomponentsofacomputergameSometimeswhenweplayagamewemaywish that thecomputeropponentswerewrittenbetterAtthose times while playing against the computer we feel that the game is unbalanced Perhaps thecomputerplayerhasbeengivendifferentsetof rulesoruses thesame rulesbuthasmore resources(healthweapons etc) The complexity of underlying AI systems alongwith game design belies theresulting feelingwehavewhenplayinganygameAs theCPUandGPUspeedandpowercontinues togrowalongwithincreasingmemoryamountsandbandwidthgamedevelopersareconstantlyimprovingthegraphicsof theirgames In the last fiveyears theproductionqualityofgameshasbeen increasing(alongwiththecorrespondingbudgets)RecentgameswooplayerswithincrediblebreakthroughsinrealͲtime3DgraphicscomplexityoftheworldsandcharactersaswellasvariouspostͲprocessingeffectsAndwhile there had been tremendous improvements for parallelizing rendering through the evolution ofconsumerGPUpipelinesartificialintelligencecomputationsaretreadingbehindTodatetherehadbeenratherfewattemptsatparallelizingAIcomputationsTypicallyinagameAIcontrolsthebehaviorofnonͲplayerͲcharacters(NPC)whethertheyarefriendlytotheplayeroractasgameopponentsThismay includeactualcharactersor itcansimplybetanksandarmies(suchasinarealͲtimestrategygame)ormonstersinafirstͲpersonshooterTheuniformfeelingisthebettertheAIisthebetterthegameAmoresophisticatedAIsystemallowsformoreinterestingandfungameplayArtificial intelligence isused forvariouspartsof thegameTypicalcomputations includepathfindingobstacleavoidanceanddecisionsmakingThesecalculationsareneededregardlessofthegenre of interactive entertainment be it a realͲtime strategy game anMMORPG or a firstͲpersonshooterWewillalsosoonseedynamiccharacterͲcentricentertainmentintheformofinteractivemovieswheretheviewerwillhavecontrolovertheoutcomeInmany scenarios the AI computations include dynamic path finding This involves autoͲsimulatingcharactersrsquobehaviorandorrunningaterrainanalysistoidentifygoodorupdatevalidpathsasresultofgameplayThesecomputationscanbequiteahogonCPUtimebudgeteveninmultiͲcorescenariosAsaresultmanygamedevelopersarelookingforwaystominimizetheCPUhitofpathfindingBecausepathfindingandAIingeneralissuchacomputeͲintensiveexpensivecalculationweoftenseeboringzombieͲlike NPC interaction Furthermore when gameplay and physics are simulated on the CPU and thecharactersare renderedon theGPU there isanadditionalPCIͲEdata transferoverhead for characterpositionsandstateIfwecanutilizetheGPUforrunninginͲgameAIcodenotonlywecanspeeduppathfindingbutwecanalsointroduceanumberofotherinterestingeffectsOurcharacterscanstartlivingontheirownresultinginsoͲcalledldquoemergentbehaviorsrdquondashsuchaslaneformationqueuingandreactionstoothercharactersandsoonAndthismeansthatgameplaywillbealotmorefunWith this inmindwe setout toexplore thenextvisual frontier for interactiveexperience combiningmassivelyparallelAIcomputationswithhighfidelityrenderingalgorithmsInthischapterwewilldescribethesimulationandrenderingmethodsusedfortheAMDFroblinsdemodesignedtoshowcasemanyofthe new approaches for characterͲcentric entertainment These techniques aremade possible by themassivelyparallelcomputeavailableonthelatestcommodityGPUssuchasATIRadeonregHD4800seriesIn our fantasy largeͲscale environment with thousands of highly detailed intelligent characters the
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
54|P a g e
Froblins (frog goblins) are concurrently simulated animated and rendered entirely on the GPU Theindividualcharacter logicforeachfroblincreature iscontrolledviaacomplexshader(over3200shaderinstructions)We are utilizing the latest functionality availablewith the DirectXreg 101 API hardwaretessellation high fidelity rendering with 4X MSAA settings at HD resolution with gammaͲcorrectrendering full high dynamic range FP16 pipeline and advanced postͲprocessing effects The crowdbehavior simulation is performed directly on theGPUWewill describe aGPUͲfriendly pathͲplanningframeworkforlargeͲscalecrowdsimulationThisframeworkcanalsobeusedtosimulatelargercrowdsofsimplifiedagentswithsmallerpolygonalcountOursystemhasbeenusedtosimulate65000agentsatrealͲtimeframeratesonasinglecommodityGPUBycombiningacontinuumͲbasedglobalpathplannerwithafineͲgrainedagentͲbased localavoidancemodelwecanperformexpensiveglobalplanningatacoarseresolutionand lowerupdateratewhile the localmodel takescareofavoidingotheragentsandnearby obstacles at a higher frequency To our knowledge this is the firstmassive crowd simulationperformedentirelyonaGPUInourinteractiveenvironmentwerenderthousandsofanimatedintelligentcharactersfromavarietyofviewpoints ranging from extreme closeͲups (with individual characters rendered at over 16 milliontriangles forcloseͲupdetail) to farawayldquobirdrsquoseyerdquoviewsof theentiresystem (over three thousandscharacters at the same time) Our system combines stateͲofͲtheͲart parallel artificial intelligencecomputationfordynamicpathfindingandlocalavoidanceontheGPUmassivecrowdrenderingwithLODmanagementwithhighendrenderingcapabilitiessuchasGPUtessellationforhighqualitycloseͲupsandstableperformance terrain systemcascaded shadows for largeͲrangeenvironmentsandanadvancedglobal illumination systemWe are able to render ourworld at interactive rates (over 20 fps on ATIRadeonregHD4870)withstaggeringpolygoncount(6ndash8milliontrianglesonaverageat20Ͳ25fps)whilemaintainingthefullhighqualitylightingandshadowingsolution
32 ArtificialIntelligenceonGPUforDynamicPathfinding321 GlobalPathfindingManysystemsforcrowdsimulationsrelyonagentͲbasedsolutionswherethemovementiscomputedforindividualagentsseparatelyWhiletherearecertainadvantagestothisapproach(independentdecisionsfor each agent individual visibility and environment information) it is also challenging to developbehavioralrules fortheagentsthatresult inaconsistentandrealisticoverallcrowdmovementAtthesame time scalingagentͲbasedmethods fora largenumberofagents isprohibitively computationallyexpensivewhichisaconcernforinteractivescenariossuchasvideogamesFor our crowd simulation solution we chose a continuumͲbased approach similar to the ContinuumCrowds work by Treuille et al [TCP06] Thismethod convertsmotion planning into an optimizationproblem usingwellͲknown numericalmethods from optics and general physics for stable navigationsolutionThis type ofmethod is particularly well suited for simulating large numbers of agents because it iscomputed spatially instead of perͲagent and results in smooth movement with no ldquodeadͲendsrdquoAdditionallyacontinuumapproachresults inflowͲlikemovementwhich ischaracteristicofactual large
C
cdmIrtinTa
wisc
Fmtt
Chapter 3 Ma
crowdsThedecision logmethodspro
nthiscontinreferred toaterrainslopen the envirneighboring
Thiscost funanylocation
where isܨ tntuitive toshortestͲpatcomputinga
Figure 2 Emovement dithat the arrothreat
arch of the Fr
globalmodgic and theoducesmoot
nuumͲbasedasa speedeetc)andaronment Thlocationand
nction isthetotheneare
the positivesee how thh from evepropagating
Example of irections Asows near th
oblins
elisonlyanerefore it isthandrealis
dcrowdsimufunction)Tavoidancefahis cost fundisusedtoe
enusedas iestgoalThi
eͲvalued coshe functionry locationgwavefront
dynamic pas the large ge ldquomonster
napproximaaugmented
sticcrowdm
ulationtheThis cost funactor(basednction descevaluatethe
nputtoasospotential(
ԡሺݔሻԡ ൌ
st function could beݔ In this ctfollowingt
ath finding ghost Froblirdquo are pointi
ationtoaccud by local
movemente
environmennction incorponagentdribes the teoptimalpa
olverthatca()iscalcula
ൌ ܨ
evaluated ie constructecontextwethepathof
in game-likin scares awing away d
uratelongͲtecollision avospeciallyina
ntisformulaporatesbotensitylargeravelͲtime tth
alculatestheatedsuchtha
(1)
in the direced by integrcan think oleastresista
ke scenarioway the littledirecting the
ermplanninoidance proareasofden
atedasacoh theachieeͲscaleobstato move fro
etotaltraveatitsatisfies
ction of therating the cof the globance
o The arrowe critters th
characters
5
gwithfullvoblem Togensecongestio
stfunctionvable speedaclesetc)foom one loc
elͲtime (potestheeikona
e gradient ost functional crowdmo
ws visualizeey scamper away from
55|P a g e
visibilityandether theseon
(sometimesd (basedonorlocationscation to a
ential) fromlequation
ሻݔሺ It isn along theovement as
e character away Note a potential
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
56|P a g e
By following thegradientof thegeneratedpotential fieldagentsareguaranteed toalwaysbemovingalongtheshortestpathtotheglobalgoalconsideringthespeedatwhichanagentcantravelbasedonterrainfeaturesobstaclesandagentdensity(congestion)3211GlobalPathfindingCPUImplementationA fast and simpleͲtoͲunderstand computational algorithm to approximate the solution to the eikonalequationistheFastMarchingMethod[TSITSIKLIS95]whichwewillsummarizehereBecausethepotentialisonlyknownforthegoallocationwebeginbysetting ൌ ͲatthegoalcellandaddingthiscelltoalistofKNOWNcellsAllothercellsareaddedtoanUNKNOWNlistwiththeirpotentialsettoλThealgorithmthenproceedsasfollows
1) AllUNKNOWNcellsadjacenttoaKNOWNcellareaddedtoaNEIGHBORlist2) The potential at eachNEIGHBOR cell is calculated based on the potential at the neighboring
KNOWNcellsandthecosttogetfromtheKNOWNcellstotheNEIGHBORcellinquestion3) Update theNEIGHBORcellwith thesmallestpotential found instep2andadd it to the listof
KNOWNcells4) RepeatuntilallcellsareintheKNOWNlist
NotethattheabovealgorithmisidenticaltoDijkstrarsquosmethodThedifferencebetweenDijkstrarsquosandtheFastMarchingMethodisthewaythatthepotentialiscalculatedinstep2SolvingthecontinuouseikonalequationbyusingDijkstrarsquosmethodonadiscretegridwillnotconvergewewillalwaysgetstairͲsteppingartifactsregardlessofthenumberoftimesyourefinethegridstructure
Figure 3 Illustration of neighboring cells used in finite difference approximation
Tsitsiklispresentsa finitedifferenceapproximation to the continuouseikonalequation thateliminatesthestairͲsteppingproblemFirsttheupwinddirectionsareidentifiedastheleastcostlyneighborsinthexandydirections(seealsoFigure3) ௫ ൌ ఢሼௐǡாሽሼ ܥெሽ ௬ ൌ ఢሼேǡௌሽሼ ܥெሽ (2)Thefinitedifferenceapproximationisthencomputedusingthegreatestsolutiontoெinthequadraticequation
ቀఝಾఝಾቁଶቀఝಾఝಾ
ቁଶൌ ͳ (3)
Chapter 3 March of the Froblins
57|P a g e
Inthecasethat௫or௬isundefined(neitherneighboralonganaxisisKNOWN)theneliminatethetermcontaining that axis from the equationOnceெis found for all cells the gradient can be easilycalculatedHowever theFastMarchingMethod isa serialalgorithmandnotamenable toparallelizationandbyextensionnothighlysuitableforefficientGPUcomputation3212GlobalPathfindingGPUImplementationLuckily there exists amethod for solving the eikonal equation in parallel The Fast IterativeMethod[JEONGWHITAKER07A JEONGWHITAKER07B]uses thesameupwind finitedifferenceapproximationdescribedinSection3211butrequiresnoordereddatastructurestomaintainlistssuchasKNOWNUNKOWNetcThe idea is toonlyperformupdates to thepotential functionat thebandof cellswhichareactive Inpracticea listof individualactivecellsdoesnotneedtobemaintainedA listofactivetilesorspatiallycoherentblocksofcells ismaintained Intuitivelythe listofactivetiles is initializedtocontainthetilecontainingthesource(thegoalinourapplication)Wesummarizethealgorithmasfollows
1) Iteratentimesonallcellsintheactivetiles2) CompareeachcellineachactivetiletothepreviouslycomputedpotentialvalueforthatcellIf
thedifferenceiswithinsomesmallthresholdmarkitasconverged3) Foreachactivetileperformareductionontheconvergenceresultstodetermineiftheentire
tileisconverged4) Performoneiterationonalltilesneighboringthetilesdeterminedtohaveconvergedinstep3
toseeifanycellvalueschange5) Update theactive listof tiles to reflectall tiles thatbecame inactivedue toconvergenceor
thatwereidentifiedasbeingreactivatedinstep4The authors reported 4Ͳ6X performance improvements over optimizedCPU implementations for theirtileͲbasedimplementationHoweverforour implementationweareabletofurthersimplifythisalgorithmbecausethecomplexityofourcostfunctiondoesnotvarygreatlyandourcellgridsaresmallrelativetothelargedatasetsusedintheauthorsrsquoworkBecauseourdatasetsaresmall(1282or2562)theconstantoverheadofperformingtestsforconvergenceoutweighsthegainsfromcullingcomputationandimpactsperformancenegativelyInotheraspectsouralgorithm is similar to theabove Inorder toensure thatour solverconvergesweneed tobeable tomakeaconservativeestimateofthenumberofiterationsneededThenumberofiterationsusedinoursolverwasdeterminedempiricallybyexaminingworstͲcasecostfunctioncomplexityBy calculating four eikonal solutions at oncewe are able to achieve 98 ALU utilization on an ATIRadeonTMHD4870withGDDR5memoryThisyieldsveryhighcomputationalthroughputOurGPUbasedsolver computesa2562 solution in20mswhich is faster thanourCPU implementationbya factorofapproximately45TheperformancedatawascollectedonanAMDPhenomtradeX4QuadͲCoreCPUsystem
AN
5
wrA
3SfwetdaOc
Fo3UahaTc
AdvancesinReNTatarchuk(E
58|P a g e
with2GBofregularenginA
3213Co
Solving theefunctionfor
where isequivalenttothecostreladifferentpatagentdensit
Once thescacalculateሺݔ
Figure 4 Lefof the blue m
322 Loc
Unfortunateavoideachohaveanaccuavoidancem
The basic gocollisions wi
ealͲTimeRendeEditor)
fRAM and aneandmem
onstructing
eikonalequtheenviron
the final cooslopeaswatedtodynathingdepenynearagoa
alarcost funሻThegradݔ
eft to Right Cmarker
calNavigat
lysolving totherwithacuratebehavmodelthatre
oal of a locith nearby a
eringin3DGra
anATIRademoryclocks
theCostFu
ation requirmentinthe
ሻݔሺܨ
ost functionwellasanylaamichazardsdingonthealtoprevent
nctionܨሺݔሻdientofሺݔሻ
Cost functio
tionandA
heeikonalecceptablefidiormodelfoesolvesthese
calmodel isagents and
aphicsandGam
eonTM4870HLSLsource
unction
resacost fuFroblinsdem
ሻ ൌ ሺݔሻ
n is the srgestaticobsandsituationFtovercrowd
isconstructሻiscalculate
n potential
Avoidance
equationatdelityisprohorourcharaefineͲgraine
s to providealso to nav
mesCoursendashS
graphics carecodeforo
unction thatmoiscompu
ሻݔሺܦ
staticmovebjectssuchaareweighForexampleingatgoals
ted (Figureedusingcen
gradient of
aresolutionhibitivelyexctersweauedobstacles
e each indivvigate arou
IGGRAPH2008
rdwith512uriterative
tcanbeevautedasfollow
ሻǡݔሺܣ
ement costasbuildings)htsthatcanitmaybed
4) itcanbentraldifferen
f potential T
nhighenouxpensiveforugmentour
vidual agentnd obstacle
8
2MBofGDDeikonalsolv
aluatedatews
(4)
(including tisthedebeadjusteddesiredtoin
esupplied tnces
The goals (so
ugh for largearealͲtimeglobaleikon
twith a veles and agen
DR5 videomverislistedi
achgridcel
terrainmovensityofagedspatiallytoncreasethe
to theeikon
ources) are a
enumbersoapplicationnalsolution
ocity thatwnts towards
memory andinAppendix
l Thecost
ement costntsandisoencouragecostdueto
alsolver to
at the center
ofagentstoInordertowithalocal
will preventits desired
Chapter 3 March of the Froblins
59|P a g e
destination This is typicallyhandledby a continuous cycleof examining thenearby environment andreactingbasedonthediscoveredinformation3221MethodOur local navigation and avoidance model computes agent velocities by examining the movementdirection determined by the global model and the positions and velocities of nearby agents ThisavoidancemodelisbasedontheVelocityObstacleformulation[FIORINISHILLER98vBPS08]ForourmodelwepresenteachagentasadiscEachagentthereforehasapositionavelocityaradiusamaximumspeedandaglobalgoaldirectionprovidedby theglobalsolverWe inferanorientationɅifrombyassumingthattheagentisorientedtowardsInourapplicationallagentshaveasimilarradiusandthereforeisconstantforallagentsAsinmostlocalmodelsupdatingandforrequiresknowledgeoftheandforallagents˒whereisthesetofallagentswithinacertaindistance 3222SpatialQueriesviaNovelGPUBinning Determining the positions and velocities of dynamic local obstacles requires a spatial data structurecontaining all obstacle information in the simulationWe developed a novelmultiͲpass algorithm forsortingagentsintospatialbinsdirectlyonGPUOuralgorithmusesa2Ddepthtexturearrayandasingle2DcolorbuffertoconstructadatastructureforstoringagentsinbinsThedepthtexturearrayservesasourAgentIDArrayAgiven2DtexeladdressinthisarrayservesasabinAsinglebinisa1Darraytexturearray sliceThebinarraygrowsdown through successive texturearray slicesEach sliceof the texturearraycontainsasingleagentID(binelement)TheagentIDsarestoredinbinsinascendingsortedorderbyagent IDThenumberofagentsthatfall intoagivenbinmaybe lessthanthebincapacity(which isdefinedbythenumberofdeptharrayslices)InordertoefficientlyquerytheagentIDsinagivenbinweuseaBinCounterTheBinCounterisa2DcolorbufferthatrecordstheloadoneachbinintheAgentIDArrayTofindallagentsnearaparticularworldspacepositionthepositionistranslatedintoa2DbinaddressAnytranslationfunctionmaybeusedOurworlddomainissquaresoasimpleuniformgridwasusedtomapworld spacepositions tobins Once thebinaddress isknown thebin load is read from theBinCounter Thisgivesusthenumberofagentsthatare inthebinweare interested in FinallytheeachagentrsquosIDinthebinisreadfromtheAgentIDarrayThis data structure must be updated each frame as agents move about the world Updates areperformedusingan iterativealgorithmthatbeginswithallagent IDs inabuffercalledtheworkingsetEach iterationasagentsareplaced intobins theyare removed from theworking set Thealgorithmcontinuestoiterateuntiltheworkingsetisreducedtozero
AN
6
FbWIccpbatgbsdsoFdttlstdpaAdPstt
AdvancesinReNTatarchuk(E
60|P a g e
Figure 5 Agbin These ID
WebeginbyDArrayarecurrent depcontainingapointprimitibymappingagentIDissthatarelessgivenbinonbedrawn inshadersimpldidnotrecesinglebinthonasubsequ
For the secodepthtargetthefirstpasstimetheveressthanoresettingtheirto less thandepthbufferpass Perforavoidinserti
After vertexdepthvaluePointsthatashaderissettheendoftthis iteration
ealͲTimeRendeEditor)
gent positionDs are later
yclearingthclearedto1th buffer allagent IDsiveAseachtheassociattoredasthethanthedenlytheagennto thatbinlyoutputs1iveanyageheagentwituentpass
ond iterationtandtheagessothesecrtexshaderdequaltothedepthvaluefunction s
rwhichresurmingtheldquogngclipkillin
shadingpoandonlyallaremarkedfttooutput2hispassthenalongwith
eringin3DGra
ns are rasterused to retri
eBinCount10Forthend the Bin isboundahpointpasstedagentrsquoswepointrsquosdepepthvaluestwiththelo Sinceweresultinginntswillremththelowes
nof thealgentsareproond iteratiodoessomeaeIDstoredinetosomevaomuch likultsinthepogreaterthannstructionsi
ointsarepaowsnonͲrejforrejection2sothattheeresultingshall theage
aphicsandGam
rized into a ieve agent po
ersto0toifirstiteratioCounter is
as the inputesthroughworldpositipthvalueTtoredintheowestID(cocanonlywthebincou
mainsettotstIDwillget
gorithm theocessedonceononceagaiadditionalwntheprevioualueoutsidee depthͲpeeointwiththenrdquotest inthnourpixels
assed toagectedpointsnaresimplyeBinCountestreamͲoutbents thatha
mesCoursendashS
bin counterositions and
ndicatethatonthetopͲms bound astvertexbuffthevertexsontoabinTheGPUrsquosdedepthbufforrespondingrite a singlenterbeingsheir initialcwrittenand
second sliceagainNoaintakesas iwork itrejecusAgentIDofthevalideling [EVERITelowestIDtevertexshashaderanda
geometry shstobothbediscardedn
erwillbesetbufferwillcavenotyet
IGGRAPH2008
r and an arrad velocities
tthebinsarmostsliceofthe currenferandeacshaderthepaddressasmdepthͲtestunferAsaresgtothepoineagent to aetto1atbinclearedvaluedotheragen
ceof theAgagentswerenputaworkctsthecurreArrayslicedepthrange
TT01]we arthatisgreateaderratherallowstheG
hader Theesenttothenotrasterizetto2atlocacontainallthbeenbinne
8
ay containin
reemptyandftheAgentIt color targhelement ipointrsquosscreementionedanitisconfigsultofallthntwiththelagivenbinnsthatreceeof0 Ifmuntswillbere
gent IDArraeremovedfrkingsetconentpointprPointsaremeThedeptre effectivelerthanthethanthepixPUtoperfo
geometry srasterizeraedandnotsationswhereheagentsthd The stre
ng IDs of ag
dallslicesoDArrayisboget The ws rasterizedenspacepoaboveTheuredtopassheagentsthowestdepthper iteratioivedanagenultipleagentejectedtobe
ay is setasromthewotainingallaimitive if itsmarkedasldquorhunitisstilly implemenpreviouslybxelshaderarmearlyͲzcu
hader testsandtobestrstreamedouepointsarehatwerebineamͲoutbuf
gents in that
oftheAgentoundastheworking setasa singleositionissetnormalizedsfragmentsatmaptoahvalue)willn thepixelntBinsthattsmaptoaeprocessed
the currentrkingsetongents ThissagentID isrejectedrdquobylconfigurednting a dualbinnedIDtoallowsustoulling
thepointrsquosreamedoututThepixelwrittenAtnnedduringfferwillnot
Chapter 3 March of the Froblins
61|P a g e
containagentsthatwerebinnedinthepreviousiterationsincetheywillhavebeenmarkedforrejectionduringthevertexshaderrsquosldquogreaterthanrdquotestandthuswillnothavebeenstreamedoutinthegeometryshader This streamͲout buffer becomes the new working set and is used as input for subsequentiterationsofthealgorithmSubsequentpassesfollowmuchliketheseconditerationofthealgorithmEachtimethedepthtargetissettothenextsliceoftheagentIDarraythepixelshaderissettooutputcurrentiterationnumberandanewworkingsetiscreatedforuseinthenextpassEachiterationresultsinareducedworkingsetThealgorithmcontinuesto iterateuntiltheworkingset isreducedtozero Anoverflowconditionoccurs ifthe iteration count reachesorexceeds theAgent IDArraydepthbefore theworking set is reduced tozeroThiscanoccuriftoomanyagentslandinagivenbinbutinpracticeoverflowcanbepreventedbyusinga largeenoughnumberofbinssothatagentsaresufficientlydistributedtoavoidoverflow AlsothedepthoftheAgentIDArraycanbeincreasedtoaccommodatehigherbinloadsApingͲponging technique isused tomanage theworking setbuffers The ldquopingrdquobuffer contains thecurrentworking setandactsas inputduringone iterationwhile the ldquopongrdquobufferactsas theoutputbufferTherolesofthepingandpongbuffersareswappedaftereachiterationTwo techniques are used to avoid CPUGPU synchronizations thatwould result in rendering pipelinestalls Predicated rendering featureofDirect3Dreg10 isused tocontrol theexecutionofeach iterationIdeallythealgorithmshouldonlycontinuetoiterateaslongasthepreviousiterationresultedinastreamͲoutbufferwithnonͲzero length Unfortunately ifwewere tocontrolexecutionon theCPUby issuingGPUqueriesaftereachpasstodetermine ifthealgorithmhadcompletedwewould introducestalls intherenderingpipelineduetoCPUGPUsynchronizationandthiswoulddegradeperformanceToavoidsynchronizationstallsallthedrawcallsforthemaximumnumberofiterations(correspondingtothemaximumallowablebin load)aremadeupfront WestillwantthealgorithmtoterminateonceallagentshavebeenbinnedsoweusepredicateddrawcallstoterminateuponcompletionThedrawcallsforeach iterationarepredicatedon the condition that theprevious iteration resulted inagentsbeingstreamedoutIfnoagentsarestreamedoutthenweknowtheworkingsethasbeenreducedtozeroandwecanterminateUsingcascadingpredicateddrawcallsinthiswaywillresultintheremainingdrawcallsbeingskipped Thus theGPU takes fullresponsibility for terminating thealgorithmonceall theagentshavebeenbinnedTheDirect3Dreg10DrawAutocallisusedtoissueeachpredicateddrawsincewedonotknowthesizeoftheworkingsetfromiterationtoiterationOur spatialdata structureprovides somebenefitsoverprevious techniques [HARADA07] Queryingourdatastructure isefficientbecausewestorebin loads inaBinCounterthusallowingustoonlyreadthenecessarynumberofelementsfromtheAgentIDArrayandevenearlyͲoutwhenabin isdeterminedtobeempty Additionallyour techniqueprovidesamechanism fordetectingoverflowemploys iterativestreamͲoutreductionoftheworkingsetandgivesexecutioncontroltotheGPUtoavoidpipelinestalls
AN
6
3AEssmtftcTf
wadt
FasTtd
AdvancesinReNTatarchuk(E
62|P a g e
3223Ag
AgentrsquosdirecEachagentesolutionWesuitable nummotionfideltodeterminefitnessfunctto collision icircleisequa
The updatedfunctionresu
ݐ ൌ
whereݓisaagentsݐሺሻdirectionstotimeͲdeltasi
Figure 6 Eaand velocitieshown
Twoexampltwo sets aredirectiongi
ealͲTimeRendeEditor)
gentMove
ctionsneedevaluatesaneusedfivedmber of dirityversuspeethetimetionbasedoisdeterminealtothedisc
d velocity (Eult(Equation
ሻሺݏݏ ൌ
ൌ ఢ
ൌ పෝ
aperͲagentሻreturnstheoevaluatencethelast
ach agent eves of agents
esetsofdire identicalWehaveal
eringin3DGra
mentDirec
tochangeanumberoffdirectionsinections forerformancetocollisionwntheangleedbyevalucradiusroft
Equation 7)n6)andthe
ൌݓݐሺሻ
ሺݏݏݐ ሺݏǡ పෝሺݐݏ
factoraffecteminimumtistheglobsimulationf
valuates a fin its curre
rectionstobin the locasochosent
aphicsandGam
ctionDeterm
asaresultoixeddirectioourapplicaour applicaoverheadfowithagentsrelativetotating a swetheagents
is then casmallesttim
ሺ ȉ
ሻ
పሻ Τݐ ሻ
tingthepreftimetocollisbalnavigatioframe
fixed numberent and adja
beevaluatedl coordinatetoevaluate
mesCoursendashS
mination
ofpathfindinonsrelativeationMoreation was dorcomputininthecurrethedesiredgept circleͲcir
lculated basmetocollisio
ሻǤ ͷ Ǥͷ
ferencetomsionwithallondirection
r of potentia
acent bins T
dforvelocitye frame defonlydirectio
IGGRAPH2008
ngand localtothegoalcanbeuseddeterminedngmoredireentoradjacglobaldirectrcle collision
sed on theoninthatdir
moveinthegagentsindiisthespeݏ
al movemenTwo agents a
yupdatearfined by thonsthatwo
8
lavoidancedirectiondedforincreasempiricallyectionsEachcentbinsEationandthen test inwh
direction wrection
globaldirectirectioneedofagent
t directions and their co
eshownine agent andouldcausea
modelsrsquocometerminedbysedmotionfby evaluathdirectioniachdirectioetimetocolhich the rad
with the larg
(5)
(6)
(7)
tionoravoidisthesetotandݐ
based on thorresponding
Figure6Nod its globalldquorightturn
mputationsytheglobalfidelityTheing desiredsevaluatedn isgivenallisionTimediusofeach
gest fitness
dnearbyfdiscreteistheݐ
he positions g V sets are
otethatthe navigationrdquochange in
Chapter 3 March of the Froblins
63|P a g e
orientation This eliminates the need to arbitratewhich direction an agent should turn based on thevelocitiesofotheragentsandalsoresults in fewercollision tests Inpractice this limitation isnotverynoticeableordistractingIt is importanttonotethatweemployasimplekinodynamicconstrainttorestrictanagentrsquoschange invelocityItisdesiredtothrottlethechangeinorientationinagiventimestepbecauseouragentshaveaphysical limit tohow fast they can change theirorientations Thisprevents suddenbroad changes inorientationindenseagentsituations323AgentStateManagementAsagentnavigatearound theenvironment it is important tomaintain informationabout theircurrentstateThis includesdatasuchascurrentpositionandvelocitycurrentgroupIDcurrentanimationcycle(or action) and current time within that animation cycleWe alsomaintain perͲagent data such asmaximumspeedandgoalachievementdistanceGoalachievementdistanceisarandomvalueisusedtodeterminethedistancefromthegoalatwhichanagenthasreachedthatgoalThis isratherspecifictoourspecificgoal typessuchasmushroomandgoldpatcheswhere thegoalhasaspecificareaand theboundariesarenebulous AgentanimationcycletransitionsareperformedusingdynamicflowcontrolwithintheshadercontrollingagentupdatelogicIfthecurrenttimewithinananimationcycleisgreaterthanthelengthofthecurrentanimation several conditions are checked to determinewhat animation cycle should be part of theagentrsquosstateThesearecurrentanimationcycledistancefromnearestgoalnumberofagentsnearbydistancefromfearinducingobstaclessuchastheghostfroblinandnoxiousgascloudsandcurrentgroupWhile these agent updates will not be very coherent the agent data texture is very small and theperformance impact isslightDespitethedimensionalityof inputtotheanimationtransitionfunction itmaybepossible toprecompute theanimation transition logic into textures (asdone in [MHR07])andeliminateflowcontrol
324PathfindingResultsDiscussion
Ourapproachhasbeenusedtosimulate~65000agentsatinteractiveratesonanATIRadeonTMHD4870while performing intensive rendering tasks such as multiͲmillion triangle scene rendering globalillumination approximation atmospheric scattering and highͲquality cascaded shadow mapping Allresultswere collectedon anAMDPhenomtradeX4QuadͲCoreCPU systemwith2GBofRAM and anATIRadeonTM4870graphics cardwith512MBofGDDR5 videomemoryand standardengineandmemoryclocksThemainbottleneck inourapplication is renderingamassivenumberofagents In thecaseofsimulating65Kagentsweuseasimplifiedagentmodel(acylinder)Simulation(globalandlocal)alonefor65K agents on the above system is 45 fps Simulation of crowd behavior and interaction alongwithrendering for this largenumberof agents is 31 fpsNote that evenwithusing a very simple cylindermodelduetoextremelylargenumberofagentswearerendering98MtrianglesinthelattercaseAllofourtestingresultswerecollectedrenderedatHDresolutionwith4XMSAA
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
64|P a g e
WhilewechosetousethetwocrowdbehaviorsimulationtechniquesforourglobalandlocalnavigationtheyalsocanbeusedseparatelyastheyeachhavetheirownadvantagesanddisadvantagesThecontinuumapproachusedforourglobalsolverworkswell intheFroblinsscenariowheretherearelargenumbersofagentswithonlya fewtypesofgoalsMorecomplexscenarioswithdiversetasksarelikely to be incompatible with this type of approach Another disadvantage of using a continuumapproach is that all agents aremodeled as having global knowledge An agentwillmake navigationchoicesbasedonobstacles thatarenotvisible to itwhichmayalsonotbedesiredWe feel that thecontinuum approach would be excellent for ambient crowds That is large groups of nonͲplayercharacters which are present mostly for scenery but are expected to navigate around a dynamicenvironmentormovingcharactersThelocalmodelalsohasdisadvantagesBylimitinglocalavoidancevelocitiestobeofaclockwisenaturethevariation inagent interaction issomewhat limitedUsingasmalldiscretesetof localdirections fornavigationcanalso lead tooscillationbetween twodirectionscreatingdistractingbehaviorWhile thelocal avoidancemodelwill prevent collisions in very densely packed situations scenarios arisewhereagentscandeadlockandwillbecomestuckThistypicallyhappensatsinksinagentnavigationssuchasatasmallgoalOnceagentsbecomedenselypackedaroundagoalagentsthatreachthegoalwillbeunableto navigate out of the goal area This could be solved by incorporating varying levels of aggressivebehavior into agentmovement that causes agents to push each other out of theway This type ofapproach couldbeaugmentedbya compositeagentapproach [YCP08] to ldquotrailͲblazerdquopaths throughdenselypackedagents
33CharacterLODManagementTheoverarchinggoalofoursystemissimulationandrenderingofmassivecrowdsofcharacterswithhighlevelofdetailThe latestgenerationsofcommodityGPUsdemonstrate incredible increases ingeometryperformanceespeciallywiththe inclusionofGPUtessellationpipeline(Section35)Neverthelessevenwith stateͲofͲtheͲartgraphicshardware renderingmultiple thousandsofcomplexcharacterswithhighpolygonalcountsatinteractiveratesisverytaxingRenderingthousandsofcharacterswithoveramillionofpolygonseachisneitherpracticalnorwiseasinmanycasesthesecharactersmaybeverysmallonthescreen and therefore performance is wasted on the details that go unnoticed For this reason it isessential to use culling and level of detail (LOD) techniques in order tomake this rendering problemtractableCullingandLODmanagementhavetraditionallybeenCPUͲcentrictaskstradingamodestamountofCPUoverheadforamuch largerreduction intheGPUworkload HoweveracommondifficultyariseswhenthepositionaldataisgeneratedbyaGPUͲbasedsimulationandthereforewouldrequireacostlyreadͲbackoperationforCPUͲsidescenemanagementThis isexactlythesituationwersquoveencounteredaswesimulatedandanimatedourcharactersentirelyontheGPUThealternativeistousetheavailablecomputetoperformallcullingandscenemanagementdirectlyontheGPUInourFroblinsdemowesolvethisproblembyemployingDirect3Dreg10geometryshadersina
Chapter 3 March of the Froblins
65|P a g e
novelwaytoperformcharactercullingandLODsortingentirelyontheGPUThisenablesustoperformthese tasksefficiently forGPUͲsimulatedcharactersTheunderlying ideascouldalsobeappliedwithaCPUsimulationinordertooffloadthescenemanagementfromtheCPU
331UsingStreamͲOutOperationsasFilteringOursystem takesadvantageof instancingsupportavailablewithDirect3Dreg10andDirect3Dreg101APIWerenderanarmyofcharactersasvaried instancedcharacterswith individualactionsandanimationscontrolledontheGPUThisnaturallyleadstothekeyideabehindourscenemanagementapproachtheuseofgeometryshadersthatactas filters forasetofcharacter instances A filteringshaderworksbytaking a set of point primitives as inputwhere each point contains the perͲinstance data needed torenderagivencharacter(positionorientationandanimationstate) ThefilteringshaderreͲemitsonlythosepointswhichpassaparticulartestwhilediscardingtherestTheemittedpointsarestreamedintoa bufferwhich can then be reͲbound as instance data and used to render the characters MultiplefilteringpassescanbechainedtogetherbyusingsuccessiveDrawAutocallsanddifferenttestscanbesetupsimplybyusingdifferentshadersInpracticeweuseasharedgeometryshadertoperformtheactualfilteringandperformthedifferentfilteringtestsinvertexshadersAsidefromprovidingmoremodularcodethisapproachcanalsoprovideperformancebenefitsThesourcetothisfilteringgeometryshaderisshowninListing1
struct GSInput XYZ contain the characterrsquos origin in world space W contains a group number which is used to vary character appearance float4 vPositionAndGroup PositionAndGroup XY contain the character orientation (a vector in the XZ plane) Z contains the index of the characterrsquos animation cycle W contains the time along the cycle (see section 35) float4 vDirection DirectionStateAndTime Result of predicate test 1 == emit 0 == do not emit float fResult TestResult struct GSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime [maxvertexcount(1)] void main( point GSInput vert[1] inout PointStreamltGSOutputgt outputStream ) [branch] if( vert[0]fResult == 1 ) GSOutput o ovPositionAndGroup = vert[0]vPositionAndGroup ovDirection = vert[0]vDirection outputStreamAppend( o )
Listing 1 A stream-filtering geometry shader
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
66|P a g e
332ViewͲFrustumCullingIt is straightforward to perform view frustum culling using a filtering geometry shader as describedabove ForviewͲfrustumcullingthevertexshadersimplyperformsan intersectioncheckbetweenthecharacterboundingvolumeandtheviewfrustumusingtheusualalgorithms(forexample[AHH08])Ifthe testpasses then the corresponding character is visible and its instancedata isemitted from thegeometryshaderandstreamedoutOtherwiseitisdiscardedAnexampleofacullingvertexshaderisgiveninListing2
struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirectionStateAndTime DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible TestResult Computes signed distance between a point and a plane vPlane Contains plane coefficients (abcd) where ax + by + cz = d vPoint Point to be tested against the plane float DistanceToPlane( float4 vPlane float3 vPoint ) return dot( float4( vPoint -1 ) vPlane ) Frustum cullling on a sphere Returns 1 if visible 0 otherwise float CullSphere( float4 vPlanes[6] float3 vCenter float fRadius ) for( uint i=0 ilt6 i++ ) entire sphere is outside one of the six planes cull immediately if( DistanceToPlane( vPlanes[i] vCenter ) gt fRadius ) return 0 return 1 float4 vFrustumPlanes[6] view-frustum planes in world space (normals face out) float3 vSphereCenter bounding sphere center relative to character origin float fSphereRadius bounding sphere radius VSOutput VS( VSInput i ) compute bounding sphere center in world space float3 vObjectPosWS = ivPositionAndGroupxyz float3 vSphereCenterWS = vBoundingSphereCenterxyz + vObjectPosWS perform view-frustum test float fVisible = CullSphere( vFrustumPlanes vSphereCenterWS fSphereRadius ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionStateAndTime = ivDirectionStateAndTime ofVisible = fVisible return o
Listing 2 Vertex shader for view-frustum culling
Chapter 3 March of the Froblins
67|P a g e
333OcclusionCullingWe can also perform occlusion culling in this framework to avoid rendering characters which arecompletelyoccludedbymountainsorstructuresBecauseweareperformingourcharactermanagementontheGPUweareabletoperformocclusionculling inanovelwaybytakingadvantageofthedepthinformationthatexists inthehardwareZbuffer Thisapproachrequiresfar lessCPUoverheadthananapproachbasedonpredicatedrenderingorocclusionquerieswhilestillallowingcullingagainstarbitrarydynamicoccluders Ourapproach issimilar inspirittothehierarchicalZtestingthat is implemented inmodernGPUsandwasinspiredbytheworkof[GKM93]whousedahierarchicaldepthimagecombinedwithanoctreetoculloccludedgeometryinbulkAfter rendering allof theoccluders in the scenewe construct ahierarchicaldepth image from the Zbufferwhichwewill refer toasaHiͲZmap TheHiͲZmap isamipͲmappedscreenͲresolution imagewhereeachtexelinmiplevelicontainsthemaximumdepthofallcorrespondingtexelsinmipleveliͲ1InthemostdetailedmipleveleachtexelsimplycontainsthecorrespondingdepthvaluefromtheZbufferThis depth information can be collected during themain rendering pass for the occluding objects aseparatedepthpassisnotrequiredtobuildtheHiͲZmapAfter construction of the HiͲZ map occlusion culling can be performed by examining the depthinformation forpixelswhicharecoveredbyanobjectrsquosboundingsphereandcomparingthemaximumfetcheddepthtotheprojecteddepthofapointonthespherethat isnearesttothecamera Althoughthisapproachdoesnotprovideanexactocclusiontestitgivesaconservativeestimatethatworkswellinmanycasesandwillneverresultinfalseculling3331HiͲZMapConstructionForsingleͲsamplerenderingonecanusetheHiͲZmapasthemaindepthbufferforrenderingthescene(usingaDepthStencilviewofthefirstmiplevel)InDirect3Dreg101multiͲsampleddepthbufferscanalsobesupportedwithanextrafullͲscreenquadpassbyfirstcomputingthemaximumdepthofeachpixelrsquossubͲsamplesandstoringtheresultinthelowestleveloftheHiͲZmapSubsequentlevelsaregeneratedusingasequenceofreductionpasseswhichrepeatedlyfetchtexelsandcomputetheirmaximumasshown inListing3 BecausescreenͲsized imagestypicallydonotmipwellcaremust be taken when reducing oddͲsizedmip levels In this case the pixels on the oddͲsizedboundaryedgemustfetchadditionaltexelstoensurethattheirdepthvaluesaretakenintoaccountInaddition it isnecessarytouse integercalculations forthetextureaddressarithmeticbecause floatingͲpointerrorcanresultinincorrectaddressingwhenrenderingintothelowermiplevelsEachofthereductionpassesrenders intoonemip leveloftheHiͲZmapresourcewhilesamplingfromthepreviousoneThisisvalidapproachinDirect3Dreg10aslongastheresourceviewusedfortheinputmip level does not overlap the one being used for output (different input and output viewsmust becreatedforeachpass)
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
68|P a g e
struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM
Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control
Chapter 3 March of the Froblins
69|P a g e
betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula
ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)
HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ
ௗ ୲ୟ୬ቀഇమቁ (9)
whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat
ቀௐଶቁ ͳ (10)
wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
70|P a g e
ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD
Chapter 3 March of the Froblins
71|P a g e
float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o
Listing 4 Vertex shader for per-instance occlusion culling
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
72|P a g e
335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands
RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )
Listing 5 Summary of our character management system
C
3 TcsftctmOdttbaipo
FccoadTmaofobt
Chapter 3 Ma
34 Cha
The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea
Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation
Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu
The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon
arch of the Fr
racterAn
nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa
canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in
xample of difgic shader
e and desired(b) User pl
we see the cushrooms (d)
of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence
oblins
nimation
h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto
ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp
ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe
mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes
ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU
redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th
ons that our determines tom the left ious poison ng away fro
eaceful restin
is illustratewherethehober isusedtouse the teeanimationweight annotableperfo
med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour
ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor
Froblins cathe current a(a) Froblin cloud in the
om the hazarng restores t
ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai
dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes
kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res
an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo
e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto
s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr
miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi
These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri
ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic
7
ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou
c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p
ns are controsed on each ed treasure tas a result bout to munts
ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo
73|P a g e
mationsandonbyvertexkthebonesfcharacters
merousdrawnd33)theursystemby
fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and
olled by the characterrsquos to the drop-they scatter
nch on some
ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
74|P a g e
Figure 8 Animation texture layout
Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss
float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering
Bone 1
Matrix Row 0
Matrix Row 1
Matrix Row 2
Bone 0
Matrix Row 0
Matrix Row 1
Matrix Row 2
Key 0 Key1 Key N
RGBA FP16
Chapter 3 March of the Froblins
75|P a g e
void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)
Listing 6 Shader code to fetch interpolate and blend bone animations
AN
7
3
F(st
Rsat(stfs
T
AdvancesinReNTatarchuk(E
76|P a g e
35 Tess
Figure 9 Ou(left) On theshaders and the tessellate
Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder
FroblLowr
Zbrushreghighr
Table 1 Com
ealͲTimeRendeEditor)
sellation
ur system alle right the textures W
ed character
erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof
lincontrolcagesolutionmod
resolutionFro
mparison of
eringin3DGra
andCrow
lows rendersame chara
While using thr on the left
GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste
edel
blinmodel
memory foo
aphicsandGam
wdRende
ing characteacter is rendhe same memwhereas the
cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo
tprint for hig
mesCoursendashS
ering
ers with extrdered withomory footprie low resolut
asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke
Polygons
5160faces
15+Mfaces
gh and low r
IGGRAPH2008
reme details ut the use oint we are ation characte
60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka
resolution m
8
in close-up of tessellatioable to add er has much
RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf
Vertexa
2Ktimes2K16b
Vert
Ind
models for ou
when using on using idehigh level ofcoarser silh
D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption
TotalMemory
ndindexbuffe
itdisplacemen
texbuffer~27
dexBuffer180
ur main char
tessellationentical pixel f details for
houettes
0and4000fied shaderdhardwareirect3Dreg11universally
ssellationasseauthoredormanceA
y
ers100KB
ntmap10MB
70MB
0MB
racter
C
Fc
Chapter 3 Ma
Figure 10 Ccharacter
arch of the Fr
Comparison
oblins
of low resolution modeel (top) and hhigh resoluttion model (
7
(bottom) for
77|P a g e
the Froblin
AN
7
Hvtcodwotddc[
3
Ihho
F(vd
Gt1g
AdvancesinReNTatarchuk(E
78|P a g e
Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08
351 GPU
n this sectiohardwareashardware fixoutlinedinF
Figure 11 A(also referrevertices thusdisplacemen
Goingthrougtakesaninp15X times fgenerationo
CoarseS
ealͲTimeRendeEditor)
essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]
UTessellati
onwewillpsusedinouxedͲfunctionigure11
An overviewed to as the s amplifyingt obtaining
ghtheproceutprimitiveor Xboxreg 3ofgraphicsc
SuperͲPrimMesh
eringin3DGra
provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional
ionPipelin
provideanorsystemWn tessellator
w of the tesseldquocontrol cagg the input mthe final tess
essforasing(whichwe60 or ATI Rcards)Aver
Tessella
aphicsandGam
veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of
ne
overviewofWedesigned
unitavailab
ellation procgerdquo or ldquothe smesh The vesellated and
gleinputpolrefertoasaRadeonreg HDrtexshader
ator
mesCoursendashS
nefits speciis effective
ologyparamowamountorenderingooverallamodisplaceme
owsustoreddvideomemhe memoryrce Table 1f the benef
GPU tesselanAPIforableonrecen
cess We stasuper-primitertex shader
d displaced h
ygonwehaasuperͲprimD 2000Ͳ4000(whichwer
Tessellated Mesh
IGGRAPH2008
fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are
1demonstrafits provided
lationpipeliaGPUtesselntconsumer
art by rendertive meshrdquo)r is used to high resolutio
avethefollomitive)anda0 GPU generefertoasa
h
Di
8
al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell
ineavailablelationpipelirGPUsThe
ring a coars The tessellaevaluate suron mesh see
wingthehaamplifiesiterations ornevaluation
Evaluatesurfacepositions
Addsplacement
active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw
ducingtheoy relevant fy savings foation can b
eoncurreninetakingade tessellation
se low resoator unit genrface position on the righ
ardwaretess(upto411t64X for Di
nshader)is
TessellatedanMes
ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor
widthThisisverallgamefor consoleorourmainbe found in
tconsumerdvantageofnprocess is
lution mesh nerates new ons and add ht
sellatorunittrianglesorrect3Dreg 11invokedfor
dDisplacedsh
d
Chapter 3 March of the Froblins
79|P a g e
eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
80|P a g e
struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS
C
L GTHsc
Fdb Hnawddeae
Chapter 3 Ma
f f f f o o o o o r
Listing 7 Ex
GiventhatwTraditionallyHoweverthspacenormachangingthe
Figure 12 Ddisplaced in be shaded us
Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese
arch of the Fr
Convert po translatin somewhat f
float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor
ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS
return o
xample simp
wearerendey animatedereexistsaalmapsInteactualdisp
Displacementhe directio
sing normal
e intend toordertocommely thatwo generatentandnormantmapmustthe tangen
MDGPUMeserequireme
oblins
osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota
S = mul( mVPS = float4( v = vNormalWS = vTangentW
S = vBinormal
le evaluation
eringourchcharactersconcernwithatcasewelacednorma
nt of the veon of geometԢ
light thedismbineTSNMwearecomthe displa
almapsalsotbe generantͲspace norhMappertontsaremet
tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir
float4( vPovPositionWS S WS lWS
n shader for
aracterwithare rendehregardstoeareessental(asshown
rtex modifietric normal
splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr
e from objectrotating abo
vPositionOS vNormalOS )vTangentOS )vBinormalOS
ositionWS 1 1 )
r rendering i
hdisplacemered and litousingdisplatiallygeneratninFigure12
es the norma displacem
faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour
t space to woout y we can
) + vInstanc ) )
) )
nstanced tes
entmapweusing tang
acementmatinganewt2)
al used for ment amount
angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor
orld space byn simplify th
cePosWS
ssellated cha
emustmakegentͲspaceappingwhentangentfram
rendering
t D The resu
cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa
8
y rotating anhe math
aracters
eanoteabonormal manlightingusimeasweare
P is the oriulting point
mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat
81|P a g e
nd
outlightingps (TSNM)ngtangentͲedisplacing
iginal point Prsquo needs to
heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan
t
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
82|P a g e
353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows
ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ
HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive
353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed
Chapter 3 March of the Froblins
83|P a g e
vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
53|P a g e
31 BeyondPrettyPicturestowardIntelligentInteractiveExperiences Artificial intelligence(AI) isgenerallyconsideredtobeoneofthekeycomponentsofacomputergameSometimeswhenweplayagamewemaywish that thecomputeropponentswerewrittenbetterAtthose times while playing against the computer we feel that the game is unbalanced Perhaps thecomputerplayerhasbeengivendifferentsetof rulesoruses thesame rulesbuthasmore resources(healthweapons etc) The complexity of underlying AI systems alongwith game design belies theresulting feelingwehavewhenplayinganygameAs theCPUandGPUspeedandpowercontinues togrowalongwithincreasingmemoryamountsandbandwidthgamedevelopersareconstantlyimprovingthegraphicsof theirgames In the last fiveyears theproductionqualityofgameshasbeen increasing(alongwiththecorrespondingbudgets)RecentgameswooplayerswithincrediblebreakthroughsinrealͲtime3DgraphicscomplexityoftheworldsandcharactersaswellasvariouspostͲprocessingeffectsAndwhile there had been tremendous improvements for parallelizing rendering through the evolution ofconsumerGPUpipelinesartificialintelligencecomputationsaretreadingbehindTodatetherehadbeenratherfewattemptsatparallelizingAIcomputationsTypicallyinagameAIcontrolsthebehaviorofnonͲplayerͲcharacters(NPC)whethertheyarefriendlytotheplayeroractasgameopponentsThismay includeactualcharactersor itcansimplybetanksandarmies(suchasinarealͲtimestrategygame)ormonstersinafirstͲpersonshooterTheuniformfeelingisthebettertheAIisthebetterthegameAmoresophisticatedAIsystemallowsformoreinterestingandfungameplayArtificial intelligence isused forvariouspartsof thegameTypicalcomputations includepathfindingobstacleavoidanceanddecisionsmakingThesecalculationsareneededregardlessofthegenre of interactive entertainment be it a realͲtime strategy game anMMORPG or a firstͲpersonshooterWewillalsosoonseedynamiccharacterͲcentricentertainmentintheformofinteractivemovieswheretheviewerwillhavecontrolovertheoutcomeInmany scenarios the AI computations include dynamic path finding This involves autoͲsimulatingcharactersrsquobehaviorandorrunningaterrainanalysistoidentifygoodorupdatevalidpathsasresultofgameplayThesecomputationscanbequiteahogonCPUtimebudgeteveninmultiͲcorescenariosAsaresultmanygamedevelopersarelookingforwaystominimizetheCPUhitofpathfindingBecausepathfindingandAIingeneralissuchacomputeͲintensiveexpensivecalculationweoftenseeboringzombieͲlike NPC interaction Furthermore when gameplay and physics are simulated on the CPU and thecharactersare renderedon theGPU there isanadditionalPCIͲEdata transferoverhead for characterpositionsandstateIfwecanutilizetheGPUforrunninginͲgameAIcodenotonlywecanspeeduppathfindingbutwecanalsointroduceanumberofotherinterestingeffectsOurcharacterscanstartlivingontheirownresultinginsoͲcalledldquoemergentbehaviorsrdquondashsuchaslaneformationqueuingandreactionstoothercharactersandsoonAndthismeansthatgameplaywillbealotmorefunWith this inmindwe setout toexplore thenextvisual frontier for interactiveexperience combiningmassivelyparallelAIcomputationswithhighfidelityrenderingalgorithmsInthischapterwewilldescribethesimulationandrenderingmethodsusedfortheAMDFroblinsdemodesignedtoshowcasemanyofthe new approaches for characterͲcentric entertainment These techniques aremade possible by themassivelyparallelcomputeavailableonthelatestcommodityGPUssuchasATIRadeonregHD4800seriesIn our fantasy largeͲscale environment with thousands of highly detailed intelligent characters the
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
54|P a g e
Froblins (frog goblins) are concurrently simulated animated and rendered entirely on the GPU Theindividualcharacter logicforeachfroblincreature iscontrolledviaacomplexshader(over3200shaderinstructions)We are utilizing the latest functionality availablewith the DirectXreg 101 API hardwaretessellation high fidelity rendering with 4X MSAA settings at HD resolution with gammaͲcorrectrendering full high dynamic range FP16 pipeline and advanced postͲprocessing effects The crowdbehavior simulation is performed directly on theGPUWewill describe aGPUͲfriendly pathͲplanningframeworkforlargeͲscalecrowdsimulationThisframeworkcanalsobeusedtosimulatelargercrowdsofsimplifiedagentswithsmallerpolygonalcountOursystemhasbeenusedtosimulate65000agentsatrealͲtimeframeratesonasinglecommodityGPUBycombiningacontinuumͲbasedglobalpathplannerwithafineͲgrainedagentͲbased localavoidancemodelwecanperformexpensiveglobalplanningatacoarseresolutionand lowerupdateratewhile the localmodel takescareofavoidingotheragentsandnearby obstacles at a higher frequency To our knowledge this is the firstmassive crowd simulationperformedentirelyonaGPUInourinteractiveenvironmentwerenderthousandsofanimatedintelligentcharactersfromavarietyofviewpoints ranging from extreme closeͲups (with individual characters rendered at over 16 milliontriangles forcloseͲupdetail) to farawayldquobirdrsquoseyerdquoviewsof theentiresystem (over three thousandscharacters at the same time) Our system combines stateͲofͲtheͲart parallel artificial intelligencecomputationfordynamicpathfindingandlocalavoidanceontheGPUmassivecrowdrenderingwithLODmanagementwithhighendrenderingcapabilitiessuchasGPUtessellationforhighqualitycloseͲupsandstableperformance terrain systemcascaded shadows for largeͲrangeenvironmentsandanadvancedglobal illumination systemWe are able to render ourworld at interactive rates (over 20 fps on ATIRadeonregHD4870)withstaggeringpolygoncount(6ndash8milliontrianglesonaverageat20Ͳ25fps)whilemaintainingthefullhighqualitylightingandshadowingsolution
32 ArtificialIntelligenceonGPUforDynamicPathfinding321 GlobalPathfindingManysystemsforcrowdsimulationsrelyonagentͲbasedsolutionswherethemovementiscomputedforindividualagentsseparatelyWhiletherearecertainadvantagestothisapproach(independentdecisionsfor each agent individual visibility and environment information) it is also challenging to developbehavioralrules fortheagentsthatresult inaconsistentandrealisticoverallcrowdmovementAtthesame time scalingagentͲbasedmethods fora largenumberofagents isprohibitively computationallyexpensivewhichisaconcernforinteractivescenariossuchasvideogamesFor our crowd simulation solution we chose a continuumͲbased approach similar to the ContinuumCrowds work by Treuille et al [TCP06] Thismethod convertsmotion planning into an optimizationproblem usingwellͲknown numericalmethods from optics and general physics for stable navigationsolutionThis type ofmethod is particularly well suited for simulating large numbers of agents because it iscomputed spatially instead of perͲagent and results in smooth movement with no ldquodeadͲendsrdquoAdditionallyacontinuumapproachresults inflowͲlikemovementwhich ischaracteristicofactual large
C
cdmIrtinTa
wisc
Fmtt
Chapter 3 Ma
crowdsThedecision logmethodspro
nthiscontinreferred toaterrainslopen the envirneighboring
Thiscost funanylocation
where isܨ tntuitive toshortestͲpatcomputinga
Figure 2 Emovement dithat the arrothreat
arch of the Fr
globalmodgic and theoducesmoot
nuumͲbasedasa speedeetc)andaronment Thlocationand
nction isthetotheneare
the positivesee how thh from evepropagating
Example of irections Asows near th
oblins
elisonlyanerefore it isthandrealis
dcrowdsimufunction)Tavoidancefahis cost fundisusedtoe
enusedas iestgoalThi
eͲvalued coshe functionry locationgwavefront
dynamic pas the large ge ldquomonster
napproximaaugmented
sticcrowdm
ulationtheThis cost funactor(basednction descevaluatethe
nputtoasospotential(
ԡሺݔሻԡ ൌ
st function could beݔ In this ctfollowingt
ath finding ghost Froblirdquo are pointi
ationtoaccud by local
movemente
environmennction incorponagentdribes the teoptimalpa
olverthatca()iscalcula
ൌ ܨ
evaluated ie constructecontextwethepathof
in game-likin scares awing away d
uratelongͲtecollision avospeciallyina
ntisformulaporatesbotensitylargeravelͲtime tth
alculatestheatedsuchtha
(1)
in the direced by integrcan think oleastresista
ke scenarioway the littledirecting the
ermplanninoidance proareasofden
atedasacoh theachieeͲscaleobstato move fro
etotaltraveatitsatisfies
ction of therating the cof the globance
o The arrowe critters th
characters
5
gwithfullvoblem Togensecongestio
stfunctionvable speedaclesetc)foom one loc
elͲtime (potestheeikona
e gradient ost functional crowdmo
ws visualizeey scamper away from
55|P a g e
visibilityandether theseon
(sometimesd (basedonorlocationscation to a
ential) fromlequation
ሻݔሺ It isn along theovement as
e character away Note a potential
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
56|P a g e
By following thegradientof thegeneratedpotential fieldagentsareguaranteed toalwaysbemovingalongtheshortestpathtotheglobalgoalconsideringthespeedatwhichanagentcantravelbasedonterrainfeaturesobstaclesandagentdensity(congestion)3211GlobalPathfindingCPUImplementationA fast and simpleͲtoͲunderstand computational algorithm to approximate the solution to the eikonalequationistheFastMarchingMethod[TSITSIKLIS95]whichwewillsummarizehereBecausethepotentialisonlyknownforthegoallocationwebeginbysetting ൌ ͲatthegoalcellandaddingthiscelltoalistofKNOWNcellsAllothercellsareaddedtoanUNKNOWNlistwiththeirpotentialsettoλThealgorithmthenproceedsasfollows
1) AllUNKNOWNcellsadjacenttoaKNOWNcellareaddedtoaNEIGHBORlist2) The potential at eachNEIGHBOR cell is calculated based on the potential at the neighboring
KNOWNcellsandthecosttogetfromtheKNOWNcellstotheNEIGHBORcellinquestion3) Update theNEIGHBORcellwith thesmallestpotential found instep2andadd it to the listof
KNOWNcells4) RepeatuntilallcellsareintheKNOWNlist
NotethattheabovealgorithmisidenticaltoDijkstrarsquosmethodThedifferencebetweenDijkstrarsquosandtheFastMarchingMethodisthewaythatthepotentialiscalculatedinstep2SolvingthecontinuouseikonalequationbyusingDijkstrarsquosmethodonadiscretegridwillnotconvergewewillalwaysgetstairͲsteppingartifactsregardlessofthenumberoftimesyourefinethegridstructure
Figure 3 Illustration of neighboring cells used in finite difference approximation
Tsitsiklispresentsa finitedifferenceapproximation to the continuouseikonalequation thateliminatesthestairͲsteppingproblemFirsttheupwinddirectionsareidentifiedastheleastcostlyneighborsinthexandydirections(seealsoFigure3) ௫ ൌ ఢሼௐǡாሽሼ ܥெሽ ௬ ൌ ఢሼேǡௌሽሼ ܥெሽ (2)Thefinitedifferenceapproximationisthencomputedusingthegreatestsolutiontoெinthequadraticequation
ቀఝಾఝಾቁଶቀఝಾఝಾ
ቁଶൌ ͳ (3)
Chapter 3 March of the Froblins
57|P a g e
Inthecasethat௫or௬isundefined(neitherneighboralonganaxisisKNOWN)theneliminatethetermcontaining that axis from the equationOnceெis found for all cells the gradient can be easilycalculatedHowever theFastMarchingMethod isa serialalgorithmandnotamenable toparallelizationandbyextensionnothighlysuitableforefficientGPUcomputation3212GlobalPathfindingGPUImplementationLuckily there exists amethod for solving the eikonal equation in parallel The Fast IterativeMethod[JEONGWHITAKER07A JEONGWHITAKER07B]uses thesameupwind finitedifferenceapproximationdescribedinSection3211butrequiresnoordereddatastructurestomaintainlistssuchasKNOWNUNKOWNetcThe idea is toonlyperformupdates to thepotential functionat thebandof cellswhichareactive Inpracticea listof individualactivecellsdoesnotneedtobemaintainedA listofactivetilesorspatiallycoherentblocksofcells ismaintained Intuitivelythe listofactivetiles is initializedtocontainthetilecontainingthesource(thegoalinourapplication)Wesummarizethealgorithmasfollows
1) Iteratentimesonallcellsintheactivetiles2) CompareeachcellineachactivetiletothepreviouslycomputedpotentialvalueforthatcellIf
thedifferenceiswithinsomesmallthresholdmarkitasconverged3) Foreachactivetileperformareductionontheconvergenceresultstodetermineiftheentire
tileisconverged4) Performoneiterationonalltilesneighboringthetilesdeterminedtohaveconvergedinstep3
toseeifanycellvalueschange5) Update theactive listof tiles to reflectall tiles thatbecame inactivedue toconvergenceor
thatwereidentifiedasbeingreactivatedinstep4The authors reported 4Ͳ6X performance improvements over optimizedCPU implementations for theirtileͲbasedimplementationHoweverforour implementationweareabletofurthersimplifythisalgorithmbecausethecomplexityofourcostfunctiondoesnotvarygreatlyandourcellgridsaresmallrelativetothelargedatasetsusedintheauthorsrsquoworkBecauseourdatasetsaresmall(1282or2562)theconstantoverheadofperformingtestsforconvergenceoutweighsthegainsfromcullingcomputationandimpactsperformancenegativelyInotheraspectsouralgorithm is similar to theabove Inorder toensure thatour solverconvergesweneed tobeable tomakeaconservativeestimateofthenumberofiterationsneededThenumberofiterationsusedinoursolverwasdeterminedempiricallybyexaminingworstͲcasecostfunctioncomplexityBy calculating four eikonal solutions at oncewe are able to achieve 98 ALU utilization on an ATIRadeonTMHD4870withGDDR5memoryThisyieldsveryhighcomputationalthroughputOurGPUbasedsolver computesa2562 solution in20mswhich is faster thanourCPU implementationbya factorofapproximately45TheperformancedatawascollectedonanAMDPhenomtradeX4QuadͲCoreCPUsystem
AN
5
wrA
3SfwetdaOc
Fo3UahaTc
AdvancesinReNTatarchuk(E
58|P a g e
with2GBofregularenginA
3213Co
Solving theefunctionfor
where isequivalenttothecostreladifferentpatagentdensit
Once thescacalculateሺݔ
Figure 4 Lefof the blue m
322 Loc
Unfortunateavoideachohaveanaccuavoidancem
The basic gocollisions wi
ealͲTimeRendeEditor)
fRAM and aneandmem
onstructing
eikonalequtheenviron
the final cooslopeaswatedtodynathingdepenynearagoa
alarcost funሻThegradݔ
eft to Right Cmarker
calNavigat
lysolving totherwithacuratebehavmodelthatre
oal of a locith nearby a
eringin3DGra
anATIRademoryclocks
theCostFu
ation requirmentinthe
ሻݔሺܨ
ost functionwellasanylaamichazardsdingonthealtoprevent
nctionܨሺݔሻdientofሺݔሻ
Cost functio
tionandA
heeikonalecceptablefidiormodelfoesolvesthese
calmodel isagents and
aphicsandGam
eonTM4870HLSLsource
unction
resacost fuFroblinsdem
ሻ ൌ ሺݔሻ
n is the srgestaticobsandsituationFtovercrowd
isconstructሻiscalculate
n potential
Avoidance
equationatdelityisprohorourcharaefineͲgraine
s to providealso to nav
mesCoursendashS
graphics carecodeforo
unction thatmoiscompu
ሻݔሺܦ
staticmovebjectssuchaareweighForexampleingatgoals
ted (Figureedusingcen
gradient of
aresolutionhibitivelyexctersweauedobstacles
e each indivvigate arou
IGGRAPH2008
rdwith512uriterative
tcanbeevautedasfollow
ሻǡݔሺܣ
ement costasbuildings)htsthatcanitmaybed
4) itcanbentraldifferen
f potential T
nhighenouxpensiveforugmentour
vidual agentnd obstacle
8
2MBofGDDeikonalsolv
aluatedatews
(4)
(including tisthedebeadjusteddesiredtoin
esupplied tnces
The goals (so
ugh for largearealͲtimeglobaleikon
twith a veles and agen
DR5 videomverislistedi
achgridcel
terrainmovensityofagedspatiallytoncreasethe
to theeikon
ources) are a
enumbersoapplicationnalsolution
ocity thatwnts towards
memory andinAppendix
l Thecost
ement costntsandisoencouragecostdueto
alsolver to
at the center
ofagentstoInordertowithalocal
will preventits desired
Chapter 3 March of the Froblins
59|P a g e
destination This is typicallyhandledby a continuous cycleof examining thenearby environment andreactingbasedonthediscoveredinformation3221MethodOur local navigation and avoidance model computes agent velocities by examining the movementdirection determined by the global model and the positions and velocities of nearby agents ThisavoidancemodelisbasedontheVelocityObstacleformulation[FIORINISHILLER98vBPS08]ForourmodelwepresenteachagentasadiscEachagentthereforehasapositionavelocityaradiusamaximumspeedandaglobalgoaldirectionprovidedby theglobalsolverWe inferanorientationɅifrombyassumingthattheagentisorientedtowardsInourapplicationallagentshaveasimilarradiusandthereforeisconstantforallagentsAsinmostlocalmodelsupdatingandforrequiresknowledgeoftheandforallagents˒whereisthesetofallagentswithinacertaindistance 3222SpatialQueriesviaNovelGPUBinning Determining the positions and velocities of dynamic local obstacles requires a spatial data structurecontaining all obstacle information in the simulationWe developed a novelmultiͲpass algorithm forsortingagentsintospatialbinsdirectlyonGPUOuralgorithmusesa2Ddepthtexturearrayandasingle2DcolorbuffertoconstructadatastructureforstoringagentsinbinsThedepthtexturearrayservesasourAgentIDArrayAgiven2DtexeladdressinthisarrayservesasabinAsinglebinisa1Darraytexturearray sliceThebinarraygrowsdown through successive texturearray slicesEach sliceof the texturearraycontainsasingleagentID(binelement)TheagentIDsarestoredinbinsinascendingsortedorderbyagent IDThenumberofagentsthatfall intoagivenbinmaybe lessthanthebincapacity(which isdefinedbythenumberofdeptharrayslices)InordertoefficientlyquerytheagentIDsinagivenbinweuseaBinCounterTheBinCounterisa2DcolorbufferthatrecordstheloadoneachbinintheAgentIDArrayTofindallagentsnearaparticularworldspacepositionthepositionistranslatedintoa2DbinaddressAnytranslationfunctionmaybeusedOurworlddomainissquaresoasimpleuniformgridwasusedtomapworld spacepositions tobins Once thebinaddress isknown thebin load is read from theBinCounter Thisgivesusthenumberofagentsthatare inthebinweare interested in FinallytheeachagentrsquosIDinthebinisreadfromtheAgentIDarrayThis data structure must be updated each frame as agents move about the world Updates areperformedusingan iterativealgorithmthatbeginswithallagent IDs inabuffercalledtheworkingsetEach iterationasagentsareplaced intobins theyare removed from theworking set Thealgorithmcontinuestoiterateuntiltheworkingsetisreducedtozero
AN
6
FbWIccpbatgbsdsoFdttlstdpaAdPstt
AdvancesinReNTatarchuk(E
60|P a g e
Figure 5 Agbin These ID
WebeginbyDArrayarecurrent depcontainingapointprimitibymappingagentIDissthatarelessgivenbinonbedrawn inshadersimpldidnotrecesinglebinthonasubsequ
For the secodepthtargetthefirstpasstimetheveressthanoresettingtheirto less thandepthbufferpass Perforavoidinserti
After vertexdepthvaluePointsthatashaderissettheendoftthis iteration
ealͲTimeRendeEditor)
gent positionDs are later
yclearingthclearedto1th buffer allagent IDsiveAseachtheassociattoredasthethanthedenlytheagennto thatbinlyoutputs1iveanyageheagentwituentpass
ond iterationtandtheagessothesecrtexshaderdequaltothedepthvaluefunction s
rwhichresurmingtheldquogngclipkillin
shadingpoandonlyallaremarkedfttooutput2hispassthenalongwith
eringin3DGra
ns are rasterused to retri
eBinCount10Forthend the Bin isboundahpointpasstedagentrsquoswepointrsquosdepepthvaluestwiththelo Sinceweresultinginntswillremththelowes
nof thealgentsareproond iteratiodoessomeaeIDstoredinetosomevaomuch likultsinthepogreaterthannstructionsi
ointsarepaowsnonͲrejforrejection2sothattheeresultingshall theage
aphicsandGam
rized into a ieve agent po
ersto0toifirstiteratioCounter is
as the inputesthroughworldpositipthvalueTtoredintheowestID(cocanonlywthebincou
mainsettotstIDwillget
gorithm theocessedonceononceagaiadditionalwntheprevioualueoutsidee depthͲpeeointwiththenrdquotest inthnourpixels
assed toagectedpointsnaresimplyeBinCountestreamͲoutbents thatha
mesCoursendashS
bin counterositions and
ndicatethatonthetopͲms bound astvertexbuffthevertexsontoabinTheGPUrsquosdedepthbufforrespondingrite a singlenterbeingsheir initialcwrittenand
second sliceagainNoaintakesas iwork itrejecusAgentIDofthevalideling [EVERITelowestIDtevertexshashaderanda
geometry shstobothbediscardedn
erwillbesetbufferwillcavenotyet
IGGRAPH2008
r and an arrad velocities
tthebinsarmostsliceofthe currenferandeacshaderthepaddressasmdepthͲtestunferAsaresgtothepoineagent to aetto1atbinclearedvaluedotheragen
ceof theAgagentswerenputaworkctsthecurreArrayslicedepthrange
TT01]we arthatisgreateaderratherallowstheG
hader Theesenttothenotrasterizetto2atlocacontainallthbeenbinne
8
ay containin
reemptyandftheAgentIt color targhelement ipointrsquosscreementionedanitisconfigsultofallthntwiththelagivenbinnsthatreceeof0 Ifmuntswillbere
gent IDArraeremovedfrkingsetconentpointprPointsaremeThedeptre effectivelerthanthethanthepixPUtoperfo
geometry srasterizeraedandnotsationswhereheagentsthd The stre
ng IDs of ag
dallslicesoDArrayisboget The ws rasterizedenspacepoaboveTheuredtopassheagentsthowestdepthper iteratioivedanagenultipleagentejectedtobe
ay is setasromthewotainingallaimitive if itsmarkedasldquorhunitisstilly implemenpreviouslybxelshaderarmearlyͲzcu
hader testsandtobestrstreamedouepointsarehatwerebineamͲoutbuf
gents in that
oftheAgentoundastheworking setasa singleositionissetnormalizedsfragmentsatmaptoahvalue)willn thepixelntBinsthattsmaptoaeprocessed
the currentrkingsetongents ThissagentID isrejectedrdquobylconfigurednting a dualbinnedIDtoallowsustoulling
thepointrsquosreamedoututThepixelwrittenAtnnedduringfferwillnot
Chapter 3 March of the Froblins
61|P a g e
containagentsthatwerebinnedinthepreviousiterationsincetheywillhavebeenmarkedforrejectionduringthevertexshaderrsquosldquogreaterthanrdquotestandthuswillnothavebeenstreamedoutinthegeometryshader This streamͲout buffer becomes the new working set and is used as input for subsequentiterationsofthealgorithmSubsequentpassesfollowmuchliketheseconditerationofthealgorithmEachtimethedepthtargetissettothenextsliceoftheagentIDarraythepixelshaderissettooutputcurrentiterationnumberandanewworkingsetiscreatedforuseinthenextpassEachiterationresultsinareducedworkingsetThealgorithmcontinuesto iterateuntiltheworkingset isreducedtozero Anoverflowconditionoccurs ifthe iteration count reachesorexceeds theAgent IDArraydepthbefore theworking set is reduced tozeroThiscanoccuriftoomanyagentslandinagivenbinbutinpracticeoverflowcanbepreventedbyusinga largeenoughnumberofbinssothatagentsaresufficientlydistributedtoavoidoverflow AlsothedepthoftheAgentIDArraycanbeincreasedtoaccommodatehigherbinloadsApingͲponging technique isused tomanage theworking setbuffers The ldquopingrdquobuffer contains thecurrentworking setandactsas inputduringone iterationwhile the ldquopongrdquobufferactsas theoutputbufferTherolesofthepingandpongbuffersareswappedaftereachiterationTwo techniques are used to avoid CPUGPU synchronizations thatwould result in rendering pipelinestalls Predicated rendering featureofDirect3Dreg10 isused tocontrol theexecutionofeach iterationIdeallythealgorithmshouldonlycontinuetoiterateaslongasthepreviousiterationresultedinastreamͲoutbufferwithnonͲzero length Unfortunately ifwewere tocontrolexecutionon theCPUby issuingGPUqueriesaftereachpasstodetermine ifthealgorithmhadcompletedwewould introducestalls intherenderingpipelineduetoCPUGPUsynchronizationandthiswoulddegradeperformanceToavoidsynchronizationstallsallthedrawcallsforthemaximumnumberofiterations(correspondingtothemaximumallowablebin load)aremadeupfront WestillwantthealgorithmtoterminateonceallagentshavebeenbinnedsoweusepredicateddrawcallstoterminateuponcompletionThedrawcallsforeach iterationarepredicatedon the condition that theprevious iteration resulted inagentsbeingstreamedoutIfnoagentsarestreamedoutthenweknowtheworkingsethasbeenreducedtozeroandwecanterminateUsingcascadingpredicateddrawcallsinthiswaywillresultintheremainingdrawcallsbeingskipped Thus theGPU takes fullresponsibility for terminating thealgorithmonceall theagentshavebeenbinnedTheDirect3Dreg10DrawAutocallisusedtoissueeachpredicateddrawsincewedonotknowthesizeoftheworkingsetfromiterationtoiterationOur spatialdata structureprovides somebenefitsoverprevious techniques [HARADA07] Queryingourdatastructure isefficientbecausewestorebin loads inaBinCounterthusallowingustoonlyreadthenecessarynumberofelementsfromtheAgentIDArrayandevenearlyͲoutwhenabin isdeterminedtobeempty Additionallyour techniqueprovidesamechanism fordetectingoverflowemploys iterativestreamͲoutreductionoftheworkingsetandgivesexecutioncontroltotheGPUtoavoidpipelinestalls
AN
6
3AEssmtftcTf
wadt
FasTtd
AdvancesinReNTatarchuk(E
62|P a g e
3223Ag
AgentrsquosdirecEachagentesolutionWesuitable nummotionfideltodeterminefitnessfunctto collision icircleisequa
The updatedfunctionresu
ݐ ൌ
whereݓisaagentsݐሺሻdirectionstotimeͲdeltasi
Figure 6 Eaand velocitieshown
Twoexampltwo sets aredirectiongi
ealͲTimeRendeEditor)
gentMove
ctionsneedevaluatesaneusedfivedmber of dirityversuspeethetimetionbasedoisdeterminealtothedisc
d velocity (Eult(Equation
ሻሺݏݏ ൌ
ൌ ఢ
ൌ పෝ
aperͲagentሻreturnstheoevaluatencethelast
ach agent eves of agents
esetsofdire identicalWehaveal
eringin3DGra
mentDirec
tochangeanumberoffdirectionsinections forerformancetocollisionwntheangleedbyevalucradiusroft
Equation 7)n6)andthe
ൌݓݐሺሻ
ሺݏݏݐ ሺݏǡ పෝሺݐݏ
factoraffecteminimumtistheglobsimulationf
valuates a fin its curre
rectionstobin the locasochosent
aphicsandGam
ctionDeterm
asaresultoixeddirectioourapplicaour applicaoverheadfowithagentsrelativetotating a swetheagents
is then casmallesttim
ሺ ȉ
ሻ
పሻ Τݐ ሻ
tingthepreftimetocollisbalnavigatioframe
fixed numberent and adja
beevaluatedl coordinatetoevaluate
mesCoursendashS
mination
ofpathfindinonsrelativeationMoreation was dorcomputininthecurrethedesiredgept circleͲcir
lculated basmetocollisio
ሻǤ ͷ Ǥͷ
ferencetomsionwithallondirection
r of potentia
acent bins T
dforvelocitye frame defonlydirectio
IGGRAPH2008
ngand localtothegoalcanbeuseddeterminedngmoredireentoradjacglobaldirectrcle collision
sed on theoninthatdir
moveinthegagentsindiisthespeݏ
al movemenTwo agents a
yupdatearfined by thonsthatwo
8
lavoidancedirectiondedforincreasempiricallyectionsEachcentbinsEationandthen test inwh
direction wrection
globaldirectirectioneedofagent
t directions and their co
eshownine agent andouldcausea
modelsrsquocometerminedbysedmotionfby evaluathdirectioniachdirectioetimetocolhich the rad
with the larg
(5)
(6)
(7)
tionoravoidisthesetotandݐ
based on thorresponding
Figure6Nod its globalldquorightturn
mputationsytheglobalfidelityTheing desiredsevaluatedn isgivenallisionTimediusofeach
gest fitness
dnearbyfdiscreteistheݐ
he positions g V sets are
otethatthe navigationrdquochange in
Chapter 3 March of the Froblins
63|P a g e
orientation This eliminates the need to arbitratewhich direction an agent should turn based on thevelocitiesofotheragentsandalsoresults in fewercollision tests Inpractice this limitation isnotverynoticeableordistractingIt is importanttonotethatweemployasimplekinodynamicconstrainttorestrictanagentrsquoschange invelocityItisdesiredtothrottlethechangeinorientationinagiventimestepbecauseouragentshaveaphysical limit tohow fast they can change theirorientations Thisprevents suddenbroad changes inorientationindenseagentsituations323AgentStateManagementAsagentnavigatearound theenvironment it is important tomaintain informationabout theircurrentstateThis includesdatasuchascurrentpositionandvelocitycurrentgroupIDcurrentanimationcycle(or action) and current time within that animation cycleWe alsomaintain perͲagent data such asmaximumspeedandgoalachievementdistanceGoalachievementdistanceisarandomvalueisusedtodeterminethedistancefromthegoalatwhichanagenthasreachedthatgoalThis isratherspecifictoourspecificgoal typessuchasmushroomandgoldpatcheswhere thegoalhasaspecificareaand theboundariesarenebulous AgentanimationcycletransitionsareperformedusingdynamicflowcontrolwithintheshadercontrollingagentupdatelogicIfthecurrenttimewithinananimationcycleisgreaterthanthelengthofthecurrentanimation several conditions are checked to determinewhat animation cycle should be part of theagentrsquosstateThesearecurrentanimationcycledistancefromnearestgoalnumberofagentsnearbydistancefromfearinducingobstaclessuchastheghostfroblinandnoxiousgascloudsandcurrentgroupWhile these agent updates will not be very coherent the agent data texture is very small and theperformance impact isslightDespitethedimensionalityof inputtotheanimationtransitionfunction itmaybepossible toprecompute theanimation transition logic into textures (asdone in [MHR07])andeliminateflowcontrol
324PathfindingResultsDiscussion
Ourapproachhasbeenusedtosimulate~65000agentsatinteractiveratesonanATIRadeonTMHD4870while performing intensive rendering tasks such as multiͲmillion triangle scene rendering globalillumination approximation atmospheric scattering and highͲquality cascaded shadow mapping Allresultswere collectedon anAMDPhenomtradeX4QuadͲCoreCPU systemwith2GBofRAM and anATIRadeonTM4870graphics cardwith512MBofGDDR5 videomemoryand standardengineandmemoryclocksThemainbottleneck inourapplication is renderingamassivenumberofagents In thecaseofsimulating65Kagentsweuseasimplifiedagentmodel(acylinder)Simulation(globalandlocal)alonefor65K agents on the above system is 45 fps Simulation of crowd behavior and interaction alongwithrendering for this largenumberof agents is 31 fpsNote that evenwithusing a very simple cylindermodelduetoextremelylargenumberofagentswearerendering98MtrianglesinthelattercaseAllofourtestingresultswerecollectedrenderedatHDresolutionwith4XMSAA
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
64|P a g e
WhilewechosetousethetwocrowdbehaviorsimulationtechniquesforourglobalandlocalnavigationtheyalsocanbeusedseparatelyastheyeachhavetheirownadvantagesanddisadvantagesThecontinuumapproachusedforourglobalsolverworkswell intheFroblinsscenariowheretherearelargenumbersofagentswithonlya fewtypesofgoalsMorecomplexscenarioswithdiversetasksarelikely to be incompatible with this type of approach Another disadvantage of using a continuumapproach is that all agents aremodeled as having global knowledge An agentwillmake navigationchoicesbasedonobstacles thatarenotvisible to itwhichmayalsonotbedesiredWe feel that thecontinuum approach would be excellent for ambient crowds That is large groups of nonͲplayercharacters which are present mostly for scenery but are expected to navigate around a dynamicenvironmentormovingcharactersThelocalmodelalsohasdisadvantagesBylimitinglocalavoidancevelocitiestobeofaclockwisenaturethevariation inagent interaction issomewhat limitedUsingasmalldiscretesetof localdirections fornavigationcanalso lead tooscillationbetween twodirectionscreatingdistractingbehaviorWhile thelocal avoidancemodelwill prevent collisions in very densely packed situations scenarios arisewhereagentscandeadlockandwillbecomestuckThistypicallyhappensatsinksinagentnavigationssuchasatasmallgoalOnceagentsbecomedenselypackedaroundagoalagentsthatreachthegoalwillbeunableto navigate out of the goal area This could be solved by incorporating varying levels of aggressivebehavior into agentmovement that causes agents to push each other out of theway This type ofapproach couldbeaugmentedbya compositeagentapproach [YCP08] to ldquotrailͲblazerdquopaths throughdenselypackedagents
33CharacterLODManagementTheoverarchinggoalofoursystemissimulationandrenderingofmassivecrowdsofcharacterswithhighlevelofdetailThe latestgenerationsofcommodityGPUsdemonstrate incredible increases ingeometryperformanceespeciallywiththe inclusionofGPUtessellationpipeline(Section35)Neverthelessevenwith stateͲofͲtheͲartgraphicshardware renderingmultiple thousandsofcomplexcharacterswithhighpolygonalcountsatinteractiveratesisverytaxingRenderingthousandsofcharacterswithoveramillionofpolygonseachisneitherpracticalnorwiseasinmanycasesthesecharactersmaybeverysmallonthescreen and therefore performance is wasted on the details that go unnoticed For this reason it isessential to use culling and level of detail (LOD) techniques in order tomake this rendering problemtractableCullingandLODmanagementhavetraditionallybeenCPUͲcentrictaskstradingamodestamountofCPUoverheadforamuch largerreduction intheGPUworkload HoweveracommondifficultyariseswhenthepositionaldataisgeneratedbyaGPUͲbasedsimulationandthereforewouldrequireacostlyreadͲbackoperationforCPUͲsidescenemanagementThis isexactlythesituationwersquoveencounteredaswesimulatedandanimatedourcharactersentirelyontheGPUThealternativeistousetheavailablecomputetoperformallcullingandscenemanagementdirectlyontheGPUInourFroblinsdemowesolvethisproblembyemployingDirect3Dreg10geometryshadersina
Chapter 3 March of the Froblins
65|P a g e
novelwaytoperformcharactercullingandLODsortingentirelyontheGPUThisenablesustoperformthese tasksefficiently forGPUͲsimulatedcharactersTheunderlying ideascouldalsobeappliedwithaCPUsimulationinordertooffloadthescenemanagementfromtheCPU
331UsingStreamͲOutOperationsasFilteringOursystem takesadvantageof instancingsupportavailablewithDirect3Dreg10andDirect3Dreg101APIWerenderanarmyofcharactersasvaried instancedcharacterswith individualactionsandanimationscontrolledontheGPUThisnaturallyleadstothekeyideabehindourscenemanagementapproachtheuseofgeometryshadersthatactas filters forasetofcharacter instances A filteringshaderworksbytaking a set of point primitives as inputwhere each point contains the perͲinstance data needed torenderagivencharacter(positionorientationandanimationstate) ThefilteringshaderreͲemitsonlythosepointswhichpassaparticulartestwhilediscardingtherestTheemittedpointsarestreamedintoa bufferwhich can then be reͲbound as instance data and used to render the characters MultiplefilteringpassescanbechainedtogetherbyusingsuccessiveDrawAutocallsanddifferenttestscanbesetupsimplybyusingdifferentshadersInpracticeweuseasharedgeometryshadertoperformtheactualfilteringandperformthedifferentfilteringtestsinvertexshadersAsidefromprovidingmoremodularcodethisapproachcanalsoprovideperformancebenefitsThesourcetothisfilteringgeometryshaderisshowninListing1
struct GSInput XYZ contain the characterrsquos origin in world space W contains a group number which is used to vary character appearance float4 vPositionAndGroup PositionAndGroup XY contain the character orientation (a vector in the XZ plane) Z contains the index of the characterrsquos animation cycle W contains the time along the cycle (see section 35) float4 vDirection DirectionStateAndTime Result of predicate test 1 == emit 0 == do not emit float fResult TestResult struct GSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime [maxvertexcount(1)] void main( point GSInput vert[1] inout PointStreamltGSOutputgt outputStream ) [branch] if( vert[0]fResult == 1 ) GSOutput o ovPositionAndGroup = vert[0]vPositionAndGroup ovDirection = vert[0]vDirection outputStreamAppend( o )
Listing 1 A stream-filtering geometry shader
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
66|P a g e
332ViewͲFrustumCullingIt is straightforward to perform view frustum culling using a filtering geometry shader as describedabove ForviewͲfrustumcullingthevertexshadersimplyperformsan intersectioncheckbetweenthecharacterboundingvolumeandtheviewfrustumusingtheusualalgorithms(forexample[AHH08])Ifthe testpasses then the corresponding character is visible and its instancedata isemitted from thegeometryshaderandstreamedoutOtherwiseitisdiscardedAnexampleofacullingvertexshaderisgiveninListing2
struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirectionStateAndTime DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible TestResult Computes signed distance between a point and a plane vPlane Contains plane coefficients (abcd) where ax + by + cz = d vPoint Point to be tested against the plane float DistanceToPlane( float4 vPlane float3 vPoint ) return dot( float4( vPoint -1 ) vPlane ) Frustum cullling on a sphere Returns 1 if visible 0 otherwise float CullSphere( float4 vPlanes[6] float3 vCenter float fRadius ) for( uint i=0 ilt6 i++ ) entire sphere is outside one of the six planes cull immediately if( DistanceToPlane( vPlanes[i] vCenter ) gt fRadius ) return 0 return 1 float4 vFrustumPlanes[6] view-frustum planes in world space (normals face out) float3 vSphereCenter bounding sphere center relative to character origin float fSphereRadius bounding sphere radius VSOutput VS( VSInput i ) compute bounding sphere center in world space float3 vObjectPosWS = ivPositionAndGroupxyz float3 vSphereCenterWS = vBoundingSphereCenterxyz + vObjectPosWS perform view-frustum test float fVisible = CullSphere( vFrustumPlanes vSphereCenterWS fSphereRadius ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionStateAndTime = ivDirectionStateAndTime ofVisible = fVisible return o
Listing 2 Vertex shader for view-frustum culling
Chapter 3 March of the Froblins
67|P a g e
333OcclusionCullingWe can also perform occlusion culling in this framework to avoid rendering characters which arecompletelyoccludedbymountainsorstructuresBecauseweareperformingourcharactermanagementontheGPUweareabletoperformocclusionculling inanovelwaybytakingadvantageofthedepthinformationthatexists inthehardwareZbuffer Thisapproachrequiresfar lessCPUoverheadthananapproachbasedonpredicatedrenderingorocclusionquerieswhilestillallowingcullingagainstarbitrarydynamicoccluders Ourapproach issimilar inspirittothehierarchicalZtestingthat is implemented inmodernGPUsandwasinspiredbytheworkof[GKM93]whousedahierarchicaldepthimagecombinedwithanoctreetoculloccludedgeometryinbulkAfter rendering allof theoccluders in the scenewe construct ahierarchicaldepth image from the Zbufferwhichwewill refer toasaHiͲZmap TheHiͲZmap isamipͲmappedscreenͲresolution imagewhereeachtexelinmiplevelicontainsthemaximumdepthofallcorrespondingtexelsinmipleveliͲ1InthemostdetailedmipleveleachtexelsimplycontainsthecorrespondingdepthvaluefromtheZbufferThis depth information can be collected during themain rendering pass for the occluding objects aseparatedepthpassisnotrequiredtobuildtheHiͲZmapAfter construction of the HiͲZ map occlusion culling can be performed by examining the depthinformation forpixelswhicharecoveredbyanobjectrsquosboundingsphereandcomparingthemaximumfetcheddepthtotheprojecteddepthofapointonthespherethat isnearesttothecamera Althoughthisapproachdoesnotprovideanexactocclusiontestitgivesaconservativeestimatethatworkswellinmanycasesandwillneverresultinfalseculling3331HiͲZMapConstructionForsingleͲsamplerenderingonecanusetheHiͲZmapasthemaindepthbufferforrenderingthescene(usingaDepthStencilviewofthefirstmiplevel)InDirect3Dreg101multiͲsampleddepthbufferscanalsobesupportedwithanextrafullͲscreenquadpassbyfirstcomputingthemaximumdepthofeachpixelrsquossubͲsamplesandstoringtheresultinthelowestleveloftheHiͲZmapSubsequentlevelsaregeneratedusingasequenceofreductionpasseswhichrepeatedlyfetchtexelsandcomputetheirmaximumasshown inListing3 BecausescreenͲsized imagestypicallydonotmipwellcaremust be taken when reducing oddͲsizedmip levels In this case the pixels on the oddͲsizedboundaryedgemustfetchadditionaltexelstoensurethattheirdepthvaluesaretakenintoaccountInaddition it isnecessarytouse integercalculations forthetextureaddressarithmeticbecause floatingͲpointerrorcanresultinincorrectaddressingwhenrenderingintothelowermiplevelsEachofthereductionpassesrenders intoonemip leveloftheHiͲZmapresourcewhilesamplingfromthepreviousoneThisisvalidapproachinDirect3Dreg10aslongastheresourceviewusedfortheinputmip level does not overlap the one being used for output (different input and output viewsmust becreatedforeachpass)
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
68|P a g e
struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM
Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control
Chapter 3 March of the Froblins
69|P a g e
betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula
ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)
HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ
ௗ ୲ୟ୬ቀഇమቁ (9)
whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat
ቀௐଶቁ ͳ (10)
wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
70|P a g e
ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD
Chapter 3 March of the Froblins
71|P a g e
float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o
Listing 4 Vertex shader for per-instance occlusion culling
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
72|P a g e
335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands
RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )
Listing 5 Summary of our character management system
C
3 TcsftctmOdttbaipo
FccoadTmaofobt
Chapter 3 Ma
34 Cha
The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea
Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation
Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu
The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon
arch of the Fr
racterAn
nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa
canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in
xample of difgic shader
e and desired(b) User pl
we see the cushrooms (d)
of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence
oblins
nimation
h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto
ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp
ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe
mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes
ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU
redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th
ons that our determines tom the left ious poison ng away fro
eaceful restin
is illustratewherethehober isusedtouse the teeanimationweight annotableperfo
med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour
ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor
Froblins cathe current a(a) Froblin cloud in the
om the hazarng restores t
ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai
dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes
kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res
an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo
e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto
s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr
miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi
These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri
ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic
7
ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou
c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p
ns are controsed on each ed treasure tas a result bout to munts
ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo
73|P a g e
mationsandonbyvertexkthebonesfcharacters
merousdrawnd33)theursystemby
fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and
olled by the characterrsquos to the drop-they scatter
nch on some
ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
74|P a g e
Figure 8 Animation texture layout
Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss
float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering
Bone 1
Matrix Row 0
Matrix Row 1
Matrix Row 2
Bone 0
Matrix Row 0
Matrix Row 1
Matrix Row 2
Key 0 Key1 Key N
RGBA FP16
Chapter 3 March of the Froblins
75|P a g e
void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)
Listing 6 Shader code to fetch interpolate and blend bone animations
AN
7
3
F(st
Rsat(stfs
T
AdvancesinReNTatarchuk(E
76|P a g e
35 Tess
Figure 9 Ou(left) On theshaders and the tessellate
Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder
FroblLowr
Zbrushreghighr
Table 1 Com
ealͲTimeRendeEditor)
sellation
ur system alle right the textures W
ed character
erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof
lincontrolcagesolutionmod
resolutionFro
mparison of
eringin3DGra
andCrow
lows rendersame chara
While using thr on the left
GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste
edel
blinmodel
memory foo
aphicsandGam
wdRende
ing characteacter is rendhe same memwhereas the
cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo
tprint for hig
mesCoursendashS
ering
ers with extrdered withomory footprie low resolut
asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke
Polygons
5160faces
15+Mfaces
gh and low r
IGGRAPH2008
reme details ut the use oint we are ation characte
60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka
resolution m
8
in close-up of tessellatioable to add er has much
RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf
Vertexa
2Ktimes2K16b
Vert
Ind
models for ou
when using on using idehigh level ofcoarser silh
D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption
TotalMemory
ndindexbuffe
itdisplacemen
texbuffer~27
dexBuffer180
ur main char
tessellationentical pixel f details for
houettes
0and4000fied shaderdhardwareirect3Dreg11universally
ssellationasseauthoredormanceA
y
ers100KB
ntmap10MB
70MB
0MB
racter
C
Fc
Chapter 3 Ma
Figure 10 Ccharacter
arch of the Fr
Comparison
oblins
of low resolution modeel (top) and hhigh resoluttion model (
7
(bottom) for
77|P a g e
the Froblin
AN
7
Hvtcodwotddc[
3
Ihho
F(vd
Gt1g
AdvancesinReNTatarchuk(E
78|P a g e
Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08
351 GPU
n this sectiohardwareashardware fixoutlinedinF
Figure 11 A(also referrevertices thusdisplacemen
Goingthrougtakesaninp15X times fgenerationo
CoarseS
ealͲTimeRendeEditor)
essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]
UTessellati
onwewillpsusedinouxedͲfunctionigure11
An overviewed to as the s amplifyingt obtaining
ghtheproceutprimitiveor Xboxreg 3ofgraphicsc
SuperͲPrimMesh
eringin3DGra
provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional
ionPipelin
provideanorsystemWn tessellator
w of the tesseldquocontrol cagg the input mthe final tess
essforasing(whichwe60 or ATI Rcards)Aver
Tessella
aphicsandGam
veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of
ne
overviewofWedesigned
unitavailab
ellation procgerdquo or ldquothe smesh The vesellated and
gleinputpolrefertoasaRadeonreg HDrtexshader
ator
mesCoursendashS
nefits speciis effective
ologyparamowamountorenderingooverallamodisplaceme
owsustoreddvideomemhe memoryrce Table 1f the benef
GPU tesselanAPIforableonrecen
cess We stasuper-primitertex shader
d displaced h
ygonwehaasuperͲprimD 2000Ͳ4000(whichwer
Tessellated Mesh
IGGRAPH2008
fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are
1demonstrafits provided
lationpipeliaGPUtesselntconsumer
art by rendertive meshrdquo)r is used to high resolutio
avethefollomitive)anda0 GPU generefertoasa
h
Di
8
al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell
ineavailablelationpipelirGPUsThe
ring a coars The tessellaevaluate suron mesh see
wingthehaamplifiesiterations ornevaluation
Evaluatesurfacepositions
Addsplacement
active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw
ducingtheoy relevant fy savings foation can b
eoncurreninetakingade tessellation
se low resoator unit genrface position on the righ
ardwaretess(upto411t64X for Di
nshader)is
TessellatedanMes
ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor
widthThisisverallgamefor consoleorourmainbe found in
tconsumerdvantageofnprocess is
lution mesh nerates new ons and add ht
sellatorunittrianglesorrect3Dreg 11invokedfor
dDisplacedsh
d
Chapter 3 March of the Froblins
79|P a g e
eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
80|P a g e
struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS
C
L GTHsc
Fdb Hnawddeae
Chapter 3 Ma
f f f f o o o o o r
Listing 7 Ex
GiventhatwTraditionallyHoweverthspacenormachangingthe
Figure 12 Ddisplaced in be shaded us
Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese
arch of the Fr
Convert po translatin somewhat f
float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor
ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS
return o
xample simp
wearerendey animatedereexistsaalmapsInteactualdisp
Displacementhe directio
sing normal
e intend toordertocommely thatwo generatentandnormantmapmustthe tangen
MDGPUMeserequireme
oblins
osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota
S = mul( mVPS = float4( v = vNormalWS = vTangentW
S = vBinormal
le evaluation
eringourchcharactersconcernwithatcasewelacednorma
nt of the veon of geometԢ
light thedismbineTSNMwearecomthe displa
almapsalsotbe generantͲspace norhMappertontsaremet
tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir
float4( vPovPositionWS S WS lWS
n shader for
aracterwithare rendehregardstoeareessental(asshown
rtex modifietric normal
splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr
e from objectrotating abo
vPositionOS vNormalOS )vTangentOS )vBinormalOS
ositionWS 1 1 )
r rendering i
hdisplacemered and litousingdisplatiallygeneratninFigure12
es the norma displacem
faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour
t space to woout y we can
) + vInstanc ) )
) )
nstanced tes
entmapweusing tang
acementmatinganewt2)
al used for ment amount
angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor
orld space byn simplify th
cePosWS
ssellated cha
emustmakegentͲspaceappingwhentangentfram
rendering
t D The resu
cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa
8
y rotating anhe math
aracters
eanoteabonormal manlightingusimeasweare
P is the oriulting point
mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat
81|P a g e
nd
outlightingps (TSNM)ngtangentͲedisplacing
iginal point Prsquo needs to
heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan
t
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
82|P a g e
353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows
ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ
HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive
353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed
Chapter 3 March of the Froblins
83|P a g e
vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
54|P a g e
Froblins (frog goblins) are concurrently simulated animated and rendered entirely on the GPU Theindividualcharacter logicforeachfroblincreature iscontrolledviaacomplexshader(over3200shaderinstructions)We are utilizing the latest functionality availablewith the DirectXreg 101 API hardwaretessellation high fidelity rendering with 4X MSAA settings at HD resolution with gammaͲcorrectrendering full high dynamic range FP16 pipeline and advanced postͲprocessing effects The crowdbehavior simulation is performed directly on theGPUWewill describe aGPUͲfriendly pathͲplanningframeworkforlargeͲscalecrowdsimulationThisframeworkcanalsobeusedtosimulatelargercrowdsofsimplifiedagentswithsmallerpolygonalcountOursystemhasbeenusedtosimulate65000agentsatrealͲtimeframeratesonasinglecommodityGPUBycombiningacontinuumͲbasedglobalpathplannerwithafineͲgrainedagentͲbased localavoidancemodelwecanperformexpensiveglobalplanningatacoarseresolutionand lowerupdateratewhile the localmodel takescareofavoidingotheragentsandnearby obstacles at a higher frequency To our knowledge this is the firstmassive crowd simulationperformedentirelyonaGPUInourinteractiveenvironmentwerenderthousandsofanimatedintelligentcharactersfromavarietyofviewpoints ranging from extreme closeͲups (with individual characters rendered at over 16 milliontriangles forcloseͲupdetail) to farawayldquobirdrsquoseyerdquoviewsof theentiresystem (over three thousandscharacters at the same time) Our system combines stateͲofͲtheͲart parallel artificial intelligencecomputationfordynamicpathfindingandlocalavoidanceontheGPUmassivecrowdrenderingwithLODmanagementwithhighendrenderingcapabilitiessuchasGPUtessellationforhighqualitycloseͲupsandstableperformance terrain systemcascaded shadows for largeͲrangeenvironmentsandanadvancedglobal illumination systemWe are able to render ourworld at interactive rates (over 20 fps on ATIRadeonregHD4870)withstaggeringpolygoncount(6ndash8milliontrianglesonaverageat20Ͳ25fps)whilemaintainingthefullhighqualitylightingandshadowingsolution
32 ArtificialIntelligenceonGPUforDynamicPathfinding321 GlobalPathfindingManysystemsforcrowdsimulationsrelyonagentͲbasedsolutionswherethemovementiscomputedforindividualagentsseparatelyWhiletherearecertainadvantagestothisapproach(independentdecisionsfor each agent individual visibility and environment information) it is also challenging to developbehavioralrules fortheagentsthatresult inaconsistentandrealisticoverallcrowdmovementAtthesame time scalingagentͲbasedmethods fora largenumberofagents isprohibitively computationallyexpensivewhichisaconcernforinteractivescenariossuchasvideogamesFor our crowd simulation solution we chose a continuumͲbased approach similar to the ContinuumCrowds work by Treuille et al [TCP06] Thismethod convertsmotion planning into an optimizationproblem usingwellͲknown numericalmethods from optics and general physics for stable navigationsolutionThis type ofmethod is particularly well suited for simulating large numbers of agents because it iscomputed spatially instead of perͲagent and results in smooth movement with no ldquodeadͲendsrdquoAdditionallyacontinuumapproachresults inflowͲlikemovementwhich ischaracteristicofactual large
C
cdmIrtinTa
wisc
Fmtt
Chapter 3 Ma
crowdsThedecision logmethodspro
nthiscontinreferred toaterrainslopen the envirneighboring
Thiscost funanylocation
where isܨ tntuitive toshortestͲpatcomputinga
Figure 2 Emovement dithat the arrothreat
arch of the Fr
globalmodgic and theoducesmoot
nuumͲbasedasa speedeetc)andaronment Thlocationand
nction isthetotheneare
the positivesee how thh from evepropagating
Example of irections Asows near th
oblins
elisonlyanerefore it isthandrealis
dcrowdsimufunction)Tavoidancefahis cost fundisusedtoe
enusedas iestgoalThi
eͲvalued coshe functionry locationgwavefront
dynamic pas the large ge ldquomonster
napproximaaugmented
sticcrowdm
ulationtheThis cost funactor(basednction descevaluatethe
nputtoasospotential(
ԡሺݔሻԡ ൌ
st function could beݔ In this ctfollowingt
ath finding ghost Froblirdquo are pointi
ationtoaccud by local
movemente
environmennction incorponagentdribes the teoptimalpa
olverthatca()iscalcula
ൌ ܨ
evaluated ie constructecontextwethepathof
in game-likin scares awing away d
uratelongͲtecollision avospeciallyina
ntisformulaporatesbotensitylargeravelͲtime tth
alculatestheatedsuchtha
(1)
in the direced by integrcan think oleastresista
ke scenarioway the littledirecting the
ermplanninoidance proareasofden
atedasacoh theachieeͲscaleobstato move fro
etotaltraveatitsatisfies
ction of therating the cof the globance
o The arrowe critters th
characters
5
gwithfullvoblem Togensecongestio
stfunctionvable speedaclesetc)foom one loc
elͲtime (potestheeikona
e gradient ost functional crowdmo
ws visualizeey scamper away from
55|P a g e
visibilityandether theseon
(sometimesd (basedonorlocationscation to a
ential) fromlequation
ሻݔሺ It isn along theovement as
e character away Note a potential
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
56|P a g e
By following thegradientof thegeneratedpotential fieldagentsareguaranteed toalwaysbemovingalongtheshortestpathtotheglobalgoalconsideringthespeedatwhichanagentcantravelbasedonterrainfeaturesobstaclesandagentdensity(congestion)3211GlobalPathfindingCPUImplementationA fast and simpleͲtoͲunderstand computational algorithm to approximate the solution to the eikonalequationistheFastMarchingMethod[TSITSIKLIS95]whichwewillsummarizehereBecausethepotentialisonlyknownforthegoallocationwebeginbysetting ൌ ͲatthegoalcellandaddingthiscelltoalistofKNOWNcellsAllothercellsareaddedtoanUNKNOWNlistwiththeirpotentialsettoλThealgorithmthenproceedsasfollows
1) AllUNKNOWNcellsadjacenttoaKNOWNcellareaddedtoaNEIGHBORlist2) The potential at eachNEIGHBOR cell is calculated based on the potential at the neighboring
KNOWNcellsandthecosttogetfromtheKNOWNcellstotheNEIGHBORcellinquestion3) Update theNEIGHBORcellwith thesmallestpotential found instep2andadd it to the listof
KNOWNcells4) RepeatuntilallcellsareintheKNOWNlist
NotethattheabovealgorithmisidenticaltoDijkstrarsquosmethodThedifferencebetweenDijkstrarsquosandtheFastMarchingMethodisthewaythatthepotentialiscalculatedinstep2SolvingthecontinuouseikonalequationbyusingDijkstrarsquosmethodonadiscretegridwillnotconvergewewillalwaysgetstairͲsteppingartifactsregardlessofthenumberoftimesyourefinethegridstructure
Figure 3 Illustration of neighboring cells used in finite difference approximation
Tsitsiklispresentsa finitedifferenceapproximation to the continuouseikonalequation thateliminatesthestairͲsteppingproblemFirsttheupwinddirectionsareidentifiedastheleastcostlyneighborsinthexandydirections(seealsoFigure3) ௫ ൌ ఢሼௐǡாሽሼ ܥெሽ ௬ ൌ ఢሼேǡௌሽሼ ܥெሽ (2)Thefinitedifferenceapproximationisthencomputedusingthegreatestsolutiontoெinthequadraticequation
ቀఝಾఝಾቁଶቀఝಾఝಾ
ቁଶൌ ͳ (3)
Chapter 3 March of the Froblins
57|P a g e
Inthecasethat௫or௬isundefined(neitherneighboralonganaxisisKNOWN)theneliminatethetermcontaining that axis from the equationOnceெis found for all cells the gradient can be easilycalculatedHowever theFastMarchingMethod isa serialalgorithmandnotamenable toparallelizationandbyextensionnothighlysuitableforefficientGPUcomputation3212GlobalPathfindingGPUImplementationLuckily there exists amethod for solving the eikonal equation in parallel The Fast IterativeMethod[JEONGWHITAKER07A JEONGWHITAKER07B]uses thesameupwind finitedifferenceapproximationdescribedinSection3211butrequiresnoordereddatastructurestomaintainlistssuchasKNOWNUNKOWNetcThe idea is toonlyperformupdates to thepotential functionat thebandof cellswhichareactive Inpracticea listof individualactivecellsdoesnotneedtobemaintainedA listofactivetilesorspatiallycoherentblocksofcells ismaintained Intuitivelythe listofactivetiles is initializedtocontainthetilecontainingthesource(thegoalinourapplication)Wesummarizethealgorithmasfollows
1) Iteratentimesonallcellsintheactivetiles2) CompareeachcellineachactivetiletothepreviouslycomputedpotentialvalueforthatcellIf
thedifferenceiswithinsomesmallthresholdmarkitasconverged3) Foreachactivetileperformareductionontheconvergenceresultstodetermineiftheentire
tileisconverged4) Performoneiterationonalltilesneighboringthetilesdeterminedtohaveconvergedinstep3
toseeifanycellvalueschange5) Update theactive listof tiles to reflectall tiles thatbecame inactivedue toconvergenceor
thatwereidentifiedasbeingreactivatedinstep4The authors reported 4Ͳ6X performance improvements over optimizedCPU implementations for theirtileͲbasedimplementationHoweverforour implementationweareabletofurthersimplifythisalgorithmbecausethecomplexityofourcostfunctiondoesnotvarygreatlyandourcellgridsaresmallrelativetothelargedatasetsusedintheauthorsrsquoworkBecauseourdatasetsaresmall(1282or2562)theconstantoverheadofperformingtestsforconvergenceoutweighsthegainsfromcullingcomputationandimpactsperformancenegativelyInotheraspectsouralgorithm is similar to theabove Inorder toensure thatour solverconvergesweneed tobeable tomakeaconservativeestimateofthenumberofiterationsneededThenumberofiterationsusedinoursolverwasdeterminedempiricallybyexaminingworstͲcasecostfunctioncomplexityBy calculating four eikonal solutions at oncewe are able to achieve 98 ALU utilization on an ATIRadeonTMHD4870withGDDR5memoryThisyieldsveryhighcomputationalthroughputOurGPUbasedsolver computesa2562 solution in20mswhich is faster thanourCPU implementationbya factorofapproximately45TheperformancedatawascollectedonanAMDPhenomtradeX4QuadͲCoreCPUsystem
AN
5
wrA
3SfwetdaOc
Fo3UahaTc
AdvancesinReNTatarchuk(E
58|P a g e
with2GBofregularenginA
3213Co
Solving theefunctionfor
where isequivalenttothecostreladifferentpatagentdensit
Once thescacalculateሺݔ
Figure 4 Lefof the blue m
322 Loc
Unfortunateavoideachohaveanaccuavoidancem
The basic gocollisions wi
ealͲTimeRendeEditor)
fRAM and aneandmem
onstructing
eikonalequtheenviron
the final cooslopeaswatedtodynathingdepenynearagoa
alarcost funሻThegradݔ
eft to Right Cmarker
calNavigat
lysolving totherwithacuratebehavmodelthatre
oal of a locith nearby a
eringin3DGra
anATIRademoryclocks
theCostFu
ation requirmentinthe
ሻݔሺܨ
ost functionwellasanylaamichazardsdingonthealtoprevent
nctionܨሺݔሻdientofሺݔሻ
Cost functio
tionandA
heeikonalecceptablefidiormodelfoesolvesthese
calmodel isagents and
aphicsandGam
eonTM4870HLSLsource
unction
resacost fuFroblinsdem
ሻ ൌ ሺݔሻ
n is the srgestaticobsandsituationFtovercrowd
isconstructሻiscalculate
n potential
Avoidance
equationatdelityisprohorourcharaefineͲgraine
s to providealso to nav
mesCoursendashS
graphics carecodeforo
unction thatmoiscompu
ሻݔሺܦ
staticmovebjectssuchaareweighForexampleingatgoals
ted (Figureedusingcen
gradient of
aresolutionhibitivelyexctersweauedobstacles
e each indivvigate arou
IGGRAPH2008
rdwith512uriterative
tcanbeevautedasfollow
ሻǡݔሺܣ
ement costasbuildings)htsthatcanitmaybed
4) itcanbentraldifferen
f potential T
nhighenouxpensiveforugmentour
vidual agentnd obstacle
8
2MBofGDDeikonalsolv
aluatedatews
(4)
(including tisthedebeadjusteddesiredtoin
esupplied tnces
The goals (so
ugh for largearealͲtimeglobaleikon
twith a veles and agen
DR5 videomverislistedi
achgridcel
terrainmovensityofagedspatiallytoncreasethe
to theeikon
ources) are a
enumbersoapplicationnalsolution
ocity thatwnts towards
memory andinAppendix
l Thecost
ement costntsandisoencouragecostdueto
alsolver to
at the center
ofagentstoInordertowithalocal
will preventits desired
Chapter 3 March of the Froblins
59|P a g e
destination This is typicallyhandledby a continuous cycleof examining thenearby environment andreactingbasedonthediscoveredinformation3221MethodOur local navigation and avoidance model computes agent velocities by examining the movementdirection determined by the global model and the positions and velocities of nearby agents ThisavoidancemodelisbasedontheVelocityObstacleformulation[FIORINISHILLER98vBPS08]ForourmodelwepresenteachagentasadiscEachagentthereforehasapositionavelocityaradiusamaximumspeedandaglobalgoaldirectionprovidedby theglobalsolverWe inferanorientationɅifrombyassumingthattheagentisorientedtowardsInourapplicationallagentshaveasimilarradiusandthereforeisconstantforallagentsAsinmostlocalmodelsupdatingandforrequiresknowledgeoftheandforallagents˒whereisthesetofallagentswithinacertaindistance 3222SpatialQueriesviaNovelGPUBinning Determining the positions and velocities of dynamic local obstacles requires a spatial data structurecontaining all obstacle information in the simulationWe developed a novelmultiͲpass algorithm forsortingagentsintospatialbinsdirectlyonGPUOuralgorithmusesa2Ddepthtexturearrayandasingle2DcolorbuffertoconstructadatastructureforstoringagentsinbinsThedepthtexturearrayservesasourAgentIDArrayAgiven2DtexeladdressinthisarrayservesasabinAsinglebinisa1Darraytexturearray sliceThebinarraygrowsdown through successive texturearray slicesEach sliceof the texturearraycontainsasingleagentID(binelement)TheagentIDsarestoredinbinsinascendingsortedorderbyagent IDThenumberofagentsthatfall intoagivenbinmaybe lessthanthebincapacity(which isdefinedbythenumberofdeptharrayslices)InordertoefficientlyquerytheagentIDsinagivenbinweuseaBinCounterTheBinCounterisa2DcolorbufferthatrecordstheloadoneachbinintheAgentIDArrayTofindallagentsnearaparticularworldspacepositionthepositionistranslatedintoa2DbinaddressAnytranslationfunctionmaybeusedOurworlddomainissquaresoasimpleuniformgridwasusedtomapworld spacepositions tobins Once thebinaddress isknown thebin load is read from theBinCounter Thisgivesusthenumberofagentsthatare inthebinweare interested in FinallytheeachagentrsquosIDinthebinisreadfromtheAgentIDarrayThis data structure must be updated each frame as agents move about the world Updates areperformedusingan iterativealgorithmthatbeginswithallagent IDs inabuffercalledtheworkingsetEach iterationasagentsareplaced intobins theyare removed from theworking set Thealgorithmcontinuestoiterateuntiltheworkingsetisreducedtozero
AN
6
FbWIccpbatgbsdsoFdttlstdpaAdPstt
AdvancesinReNTatarchuk(E
60|P a g e
Figure 5 Agbin These ID
WebeginbyDArrayarecurrent depcontainingapointprimitibymappingagentIDissthatarelessgivenbinonbedrawn inshadersimpldidnotrecesinglebinthonasubsequ
For the secodepthtargetthefirstpasstimetheveressthanoresettingtheirto less thandepthbufferpass Perforavoidinserti
After vertexdepthvaluePointsthatashaderissettheendoftthis iteration
ealͲTimeRendeEditor)
gent positionDs are later
yclearingthclearedto1th buffer allagent IDsiveAseachtheassociattoredasthethanthedenlytheagennto thatbinlyoutputs1iveanyageheagentwituentpass
ond iterationtandtheagessothesecrtexshaderdequaltothedepthvaluefunction s
rwhichresurmingtheldquogngclipkillin
shadingpoandonlyallaremarkedfttooutput2hispassthenalongwith
eringin3DGra
ns are rasterused to retri
eBinCount10Forthend the Bin isboundahpointpasstedagentrsquoswepointrsquosdepepthvaluestwiththelo Sinceweresultinginntswillremththelowes
nof thealgentsareproond iteratiodoessomeaeIDstoredinetosomevaomuch likultsinthepogreaterthannstructionsi
ointsarepaowsnonͲrejforrejection2sothattheeresultingshall theage
aphicsandGam
rized into a ieve agent po
ersto0toifirstiteratioCounter is
as the inputesthroughworldpositipthvalueTtoredintheowestID(cocanonlywthebincou
mainsettotstIDwillget
gorithm theocessedonceononceagaiadditionalwntheprevioualueoutsidee depthͲpeeointwiththenrdquotest inthnourpixels
assed toagectedpointsnaresimplyeBinCountestreamͲoutbents thatha
mesCoursendashS
bin counterositions and
ndicatethatonthetopͲms bound astvertexbuffthevertexsontoabinTheGPUrsquosdedepthbufforrespondingrite a singlenterbeingsheir initialcwrittenand
second sliceagainNoaintakesas iwork itrejecusAgentIDofthevalideling [EVERITelowestIDtevertexshashaderanda
geometry shstobothbediscardedn
erwillbesetbufferwillcavenotyet
IGGRAPH2008
r and an arrad velocities
tthebinsarmostsliceofthe currenferandeacshaderthepaddressasmdepthͲtestunferAsaresgtothepoineagent to aetto1atbinclearedvaluedotheragen
ceof theAgagentswerenputaworkctsthecurreArrayslicedepthrange
TT01]we arthatisgreateaderratherallowstheG
hader Theesenttothenotrasterizetto2atlocacontainallthbeenbinne
8
ay containin
reemptyandftheAgentIt color targhelement ipointrsquosscreementionedanitisconfigsultofallthntwiththelagivenbinnsthatreceeof0 Ifmuntswillbere
gent IDArraeremovedfrkingsetconentpointprPointsaremeThedeptre effectivelerthanthethanthepixPUtoperfo
geometry srasterizeraedandnotsationswhereheagentsthd The stre
ng IDs of ag
dallslicesoDArrayisboget The ws rasterizedenspacepoaboveTheuredtopassheagentsthowestdepthper iteratioivedanagenultipleagentejectedtobe
ay is setasromthewotainingallaimitive if itsmarkedasldquorhunitisstilly implemenpreviouslybxelshaderarmearlyͲzcu
hader testsandtobestrstreamedouepointsarehatwerebineamͲoutbuf
gents in that
oftheAgentoundastheworking setasa singleositionissetnormalizedsfragmentsatmaptoahvalue)willn thepixelntBinsthattsmaptoaeprocessed
the currentrkingsetongents ThissagentID isrejectedrdquobylconfigurednting a dualbinnedIDtoallowsustoulling
thepointrsquosreamedoututThepixelwrittenAtnnedduringfferwillnot
Chapter 3 March of the Froblins
61|P a g e
containagentsthatwerebinnedinthepreviousiterationsincetheywillhavebeenmarkedforrejectionduringthevertexshaderrsquosldquogreaterthanrdquotestandthuswillnothavebeenstreamedoutinthegeometryshader This streamͲout buffer becomes the new working set and is used as input for subsequentiterationsofthealgorithmSubsequentpassesfollowmuchliketheseconditerationofthealgorithmEachtimethedepthtargetissettothenextsliceoftheagentIDarraythepixelshaderissettooutputcurrentiterationnumberandanewworkingsetiscreatedforuseinthenextpassEachiterationresultsinareducedworkingsetThealgorithmcontinuesto iterateuntiltheworkingset isreducedtozero Anoverflowconditionoccurs ifthe iteration count reachesorexceeds theAgent IDArraydepthbefore theworking set is reduced tozeroThiscanoccuriftoomanyagentslandinagivenbinbutinpracticeoverflowcanbepreventedbyusinga largeenoughnumberofbinssothatagentsaresufficientlydistributedtoavoidoverflow AlsothedepthoftheAgentIDArraycanbeincreasedtoaccommodatehigherbinloadsApingͲponging technique isused tomanage theworking setbuffers The ldquopingrdquobuffer contains thecurrentworking setandactsas inputduringone iterationwhile the ldquopongrdquobufferactsas theoutputbufferTherolesofthepingandpongbuffersareswappedaftereachiterationTwo techniques are used to avoid CPUGPU synchronizations thatwould result in rendering pipelinestalls Predicated rendering featureofDirect3Dreg10 isused tocontrol theexecutionofeach iterationIdeallythealgorithmshouldonlycontinuetoiterateaslongasthepreviousiterationresultedinastreamͲoutbufferwithnonͲzero length Unfortunately ifwewere tocontrolexecutionon theCPUby issuingGPUqueriesaftereachpasstodetermine ifthealgorithmhadcompletedwewould introducestalls intherenderingpipelineduetoCPUGPUsynchronizationandthiswoulddegradeperformanceToavoidsynchronizationstallsallthedrawcallsforthemaximumnumberofiterations(correspondingtothemaximumallowablebin load)aremadeupfront WestillwantthealgorithmtoterminateonceallagentshavebeenbinnedsoweusepredicateddrawcallstoterminateuponcompletionThedrawcallsforeach iterationarepredicatedon the condition that theprevious iteration resulted inagentsbeingstreamedoutIfnoagentsarestreamedoutthenweknowtheworkingsethasbeenreducedtozeroandwecanterminateUsingcascadingpredicateddrawcallsinthiswaywillresultintheremainingdrawcallsbeingskipped Thus theGPU takes fullresponsibility for terminating thealgorithmonceall theagentshavebeenbinnedTheDirect3Dreg10DrawAutocallisusedtoissueeachpredicateddrawsincewedonotknowthesizeoftheworkingsetfromiterationtoiterationOur spatialdata structureprovides somebenefitsoverprevious techniques [HARADA07] Queryingourdatastructure isefficientbecausewestorebin loads inaBinCounterthusallowingustoonlyreadthenecessarynumberofelementsfromtheAgentIDArrayandevenearlyͲoutwhenabin isdeterminedtobeempty Additionallyour techniqueprovidesamechanism fordetectingoverflowemploys iterativestreamͲoutreductionoftheworkingsetandgivesexecutioncontroltotheGPUtoavoidpipelinestalls
AN
6
3AEssmtftcTf
wadt
FasTtd
AdvancesinReNTatarchuk(E
62|P a g e
3223Ag
AgentrsquosdirecEachagentesolutionWesuitable nummotionfideltodeterminefitnessfunctto collision icircleisequa
The updatedfunctionresu
ݐ ൌ
whereݓisaagentsݐሺሻdirectionstotimeͲdeltasi
Figure 6 Eaand velocitieshown
Twoexampltwo sets aredirectiongi
ealͲTimeRendeEditor)
gentMove
ctionsneedevaluatesaneusedfivedmber of dirityversuspeethetimetionbasedoisdeterminealtothedisc
d velocity (Eult(Equation
ሻሺݏݏ ൌ
ൌ ఢ
ൌ పෝ
aperͲagentሻreturnstheoevaluatencethelast
ach agent eves of agents
esetsofdire identicalWehaveal
eringin3DGra
mentDirec
tochangeanumberoffdirectionsinections forerformancetocollisionwntheangleedbyevalucradiusroft
Equation 7)n6)andthe
ൌݓݐሺሻ
ሺݏݏݐ ሺݏǡ పෝሺݐݏ
factoraffecteminimumtistheglobsimulationf
valuates a fin its curre
rectionstobin the locasochosent
aphicsandGam
ctionDeterm
asaresultoixeddirectioourapplicaour applicaoverheadfowithagentsrelativetotating a swetheagents
is then casmallesttim
ሺ ȉ
ሻ
పሻ Τݐ ሻ
tingthepreftimetocollisbalnavigatioframe
fixed numberent and adja
beevaluatedl coordinatetoevaluate
mesCoursendashS
mination
ofpathfindinonsrelativeationMoreation was dorcomputininthecurrethedesiredgept circleͲcir
lculated basmetocollisio
ሻǤ ͷ Ǥͷ
ferencetomsionwithallondirection
r of potentia
acent bins T
dforvelocitye frame defonlydirectio
IGGRAPH2008
ngand localtothegoalcanbeuseddeterminedngmoredireentoradjacglobaldirectrcle collision
sed on theoninthatdir
moveinthegagentsindiisthespeݏ
al movemenTwo agents a
yupdatearfined by thonsthatwo
8
lavoidancedirectiondedforincreasempiricallyectionsEachcentbinsEationandthen test inwh
direction wrection
globaldirectirectioneedofagent
t directions and their co
eshownine agent andouldcausea
modelsrsquocometerminedbysedmotionfby evaluathdirectioniachdirectioetimetocolhich the rad
with the larg
(5)
(6)
(7)
tionoravoidisthesetotandݐ
based on thorresponding
Figure6Nod its globalldquorightturn
mputationsytheglobalfidelityTheing desiredsevaluatedn isgivenallisionTimediusofeach
gest fitness
dnearbyfdiscreteistheݐ
he positions g V sets are
otethatthe navigationrdquochange in
Chapter 3 March of the Froblins
63|P a g e
orientation This eliminates the need to arbitratewhich direction an agent should turn based on thevelocitiesofotheragentsandalsoresults in fewercollision tests Inpractice this limitation isnotverynoticeableordistractingIt is importanttonotethatweemployasimplekinodynamicconstrainttorestrictanagentrsquoschange invelocityItisdesiredtothrottlethechangeinorientationinagiventimestepbecauseouragentshaveaphysical limit tohow fast they can change theirorientations Thisprevents suddenbroad changes inorientationindenseagentsituations323AgentStateManagementAsagentnavigatearound theenvironment it is important tomaintain informationabout theircurrentstateThis includesdatasuchascurrentpositionandvelocitycurrentgroupIDcurrentanimationcycle(or action) and current time within that animation cycleWe alsomaintain perͲagent data such asmaximumspeedandgoalachievementdistanceGoalachievementdistanceisarandomvalueisusedtodeterminethedistancefromthegoalatwhichanagenthasreachedthatgoalThis isratherspecifictoourspecificgoal typessuchasmushroomandgoldpatcheswhere thegoalhasaspecificareaand theboundariesarenebulous AgentanimationcycletransitionsareperformedusingdynamicflowcontrolwithintheshadercontrollingagentupdatelogicIfthecurrenttimewithinananimationcycleisgreaterthanthelengthofthecurrentanimation several conditions are checked to determinewhat animation cycle should be part of theagentrsquosstateThesearecurrentanimationcycledistancefromnearestgoalnumberofagentsnearbydistancefromfearinducingobstaclessuchastheghostfroblinandnoxiousgascloudsandcurrentgroupWhile these agent updates will not be very coherent the agent data texture is very small and theperformance impact isslightDespitethedimensionalityof inputtotheanimationtransitionfunction itmaybepossible toprecompute theanimation transition logic into textures (asdone in [MHR07])andeliminateflowcontrol
324PathfindingResultsDiscussion
Ourapproachhasbeenusedtosimulate~65000agentsatinteractiveratesonanATIRadeonTMHD4870while performing intensive rendering tasks such as multiͲmillion triangle scene rendering globalillumination approximation atmospheric scattering and highͲquality cascaded shadow mapping Allresultswere collectedon anAMDPhenomtradeX4QuadͲCoreCPU systemwith2GBofRAM and anATIRadeonTM4870graphics cardwith512MBofGDDR5 videomemoryand standardengineandmemoryclocksThemainbottleneck inourapplication is renderingamassivenumberofagents In thecaseofsimulating65Kagentsweuseasimplifiedagentmodel(acylinder)Simulation(globalandlocal)alonefor65K agents on the above system is 45 fps Simulation of crowd behavior and interaction alongwithrendering for this largenumberof agents is 31 fpsNote that evenwithusing a very simple cylindermodelduetoextremelylargenumberofagentswearerendering98MtrianglesinthelattercaseAllofourtestingresultswerecollectedrenderedatHDresolutionwith4XMSAA
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
64|P a g e
WhilewechosetousethetwocrowdbehaviorsimulationtechniquesforourglobalandlocalnavigationtheyalsocanbeusedseparatelyastheyeachhavetheirownadvantagesanddisadvantagesThecontinuumapproachusedforourglobalsolverworkswell intheFroblinsscenariowheretherearelargenumbersofagentswithonlya fewtypesofgoalsMorecomplexscenarioswithdiversetasksarelikely to be incompatible with this type of approach Another disadvantage of using a continuumapproach is that all agents aremodeled as having global knowledge An agentwillmake navigationchoicesbasedonobstacles thatarenotvisible to itwhichmayalsonotbedesiredWe feel that thecontinuum approach would be excellent for ambient crowds That is large groups of nonͲplayercharacters which are present mostly for scenery but are expected to navigate around a dynamicenvironmentormovingcharactersThelocalmodelalsohasdisadvantagesBylimitinglocalavoidancevelocitiestobeofaclockwisenaturethevariation inagent interaction issomewhat limitedUsingasmalldiscretesetof localdirections fornavigationcanalso lead tooscillationbetween twodirectionscreatingdistractingbehaviorWhile thelocal avoidancemodelwill prevent collisions in very densely packed situations scenarios arisewhereagentscandeadlockandwillbecomestuckThistypicallyhappensatsinksinagentnavigationssuchasatasmallgoalOnceagentsbecomedenselypackedaroundagoalagentsthatreachthegoalwillbeunableto navigate out of the goal area This could be solved by incorporating varying levels of aggressivebehavior into agentmovement that causes agents to push each other out of theway This type ofapproach couldbeaugmentedbya compositeagentapproach [YCP08] to ldquotrailͲblazerdquopaths throughdenselypackedagents
33CharacterLODManagementTheoverarchinggoalofoursystemissimulationandrenderingofmassivecrowdsofcharacterswithhighlevelofdetailThe latestgenerationsofcommodityGPUsdemonstrate incredible increases ingeometryperformanceespeciallywiththe inclusionofGPUtessellationpipeline(Section35)Neverthelessevenwith stateͲofͲtheͲartgraphicshardware renderingmultiple thousandsofcomplexcharacterswithhighpolygonalcountsatinteractiveratesisverytaxingRenderingthousandsofcharacterswithoveramillionofpolygonseachisneitherpracticalnorwiseasinmanycasesthesecharactersmaybeverysmallonthescreen and therefore performance is wasted on the details that go unnoticed For this reason it isessential to use culling and level of detail (LOD) techniques in order tomake this rendering problemtractableCullingandLODmanagementhavetraditionallybeenCPUͲcentrictaskstradingamodestamountofCPUoverheadforamuch largerreduction intheGPUworkload HoweveracommondifficultyariseswhenthepositionaldataisgeneratedbyaGPUͲbasedsimulationandthereforewouldrequireacostlyreadͲbackoperationforCPUͲsidescenemanagementThis isexactlythesituationwersquoveencounteredaswesimulatedandanimatedourcharactersentirelyontheGPUThealternativeistousetheavailablecomputetoperformallcullingandscenemanagementdirectlyontheGPUInourFroblinsdemowesolvethisproblembyemployingDirect3Dreg10geometryshadersina
Chapter 3 March of the Froblins
65|P a g e
novelwaytoperformcharactercullingandLODsortingentirelyontheGPUThisenablesustoperformthese tasksefficiently forGPUͲsimulatedcharactersTheunderlying ideascouldalsobeappliedwithaCPUsimulationinordertooffloadthescenemanagementfromtheCPU
331UsingStreamͲOutOperationsasFilteringOursystem takesadvantageof instancingsupportavailablewithDirect3Dreg10andDirect3Dreg101APIWerenderanarmyofcharactersasvaried instancedcharacterswith individualactionsandanimationscontrolledontheGPUThisnaturallyleadstothekeyideabehindourscenemanagementapproachtheuseofgeometryshadersthatactas filters forasetofcharacter instances A filteringshaderworksbytaking a set of point primitives as inputwhere each point contains the perͲinstance data needed torenderagivencharacter(positionorientationandanimationstate) ThefilteringshaderreͲemitsonlythosepointswhichpassaparticulartestwhilediscardingtherestTheemittedpointsarestreamedintoa bufferwhich can then be reͲbound as instance data and used to render the characters MultiplefilteringpassescanbechainedtogetherbyusingsuccessiveDrawAutocallsanddifferenttestscanbesetupsimplybyusingdifferentshadersInpracticeweuseasharedgeometryshadertoperformtheactualfilteringandperformthedifferentfilteringtestsinvertexshadersAsidefromprovidingmoremodularcodethisapproachcanalsoprovideperformancebenefitsThesourcetothisfilteringgeometryshaderisshowninListing1
struct GSInput XYZ contain the characterrsquos origin in world space W contains a group number which is used to vary character appearance float4 vPositionAndGroup PositionAndGroup XY contain the character orientation (a vector in the XZ plane) Z contains the index of the characterrsquos animation cycle W contains the time along the cycle (see section 35) float4 vDirection DirectionStateAndTime Result of predicate test 1 == emit 0 == do not emit float fResult TestResult struct GSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime [maxvertexcount(1)] void main( point GSInput vert[1] inout PointStreamltGSOutputgt outputStream ) [branch] if( vert[0]fResult == 1 ) GSOutput o ovPositionAndGroup = vert[0]vPositionAndGroup ovDirection = vert[0]vDirection outputStreamAppend( o )
Listing 1 A stream-filtering geometry shader
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
66|P a g e
332ViewͲFrustumCullingIt is straightforward to perform view frustum culling using a filtering geometry shader as describedabove ForviewͲfrustumcullingthevertexshadersimplyperformsan intersectioncheckbetweenthecharacterboundingvolumeandtheviewfrustumusingtheusualalgorithms(forexample[AHH08])Ifthe testpasses then the corresponding character is visible and its instancedata isemitted from thegeometryshaderandstreamedoutOtherwiseitisdiscardedAnexampleofacullingvertexshaderisgiveninListing2
struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirectionStateAndTime DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible TestResult Computes signed distance between a point and a plane vPlane Contains plane coefficients (abcd) where ax + by + cz = d vPoint Point to be tested against the plane float DistanceToPlane( float4 vPlane float3 vPoint ) return dot( float4( vPoint -1 ) vPlane ) Frustum cullling on a sphere Returns 1 if visible 0 otherwise float CullSphere( float4 vPlanes[6] float3 vCenter float fRadius ) for( uint i=0 ilt6 i++ ) entire sphere is outside one of the six planes cull immediately if( DistanceToPlane( vPlanes[i] vCenter ) gt fRadius ) return 0 return 1 float4 vFrustumPlanes[6] view-frustum planes in world space (normals face out) float3 vSphereCenter bounding sphere center relative to character origin float fSphereRadius bounding sphere radius VSOutput VS( VSInput i ) compute bounding sphere center in world space float3 vObjectPosWS = ivPositionAndGroupxyz float3 vSphereCenterWS = vBoundingSphereCenterxyz + vObjectPosWS perform view-frustum test float fVisible = CullSphere( vFrustumPlanes vSphereCenterWS fSphereRadius ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionStateAndTime = ivDirectionStateAndTime ofVisible = fVisible return o
Listing 2 Vertex shader for view-frustum culling
Chapter 3 March of the Froblins
67|P a g e
333OcclusionCullingWe can also perform occlusion culling in this framework to avoid rendering characters which arecompletelyoccludedbymountainsorstructuresBecauseweareperformingourcharactermanagementontheGPUweareabletoperformocclusionculling inanovelwaybytakingadvantageofthedepthinformationthatexists inthehardwareZbuffer Thisapproachrequiresfar lessCPUoverheadthananapproachbasedonpredicatedrenderingorocclusionquerieswhilestillallowingcullingagainstarbitrarydynamicoccluders Ourapproach issimilar inspirittothehierarchicalZtestingthat is implemented inmodernGPUsandwasinspiredbytheworkof[GKM93]whousedahierarchicaldepthimagecombinedwithanoctreetoculloccludedgeometryinbulkAfter rendering allof theoccluders in the scenewe construct ahierarchicaldepth image from the Zbufferwhichwewill refer toasaHiͲZmap TheHiͲZmap isamipͲmappedscreenͲresolution imagewhereeachtexelinmiplevelicontainsthemaximumdepthofallcorrespondingtexelsinmipleveliͲ1InthemostdetailedmipleveleachtexelsimplycontainsthecorrespondingdepthvaluefromtheZbufferThis depth information can be collected during themain rendering pass for the occluding objects aseparatedepthpassisnotrequiredtobuildtheHiͲZmapAfter construction of the HiͲZ map occlusion culling can be performed by examining the depthinformation forpixelswhicharecoveredbyanobjectrsquosboundingsphereandcomparingthemaximumfetcheddepthtotheprojecteddepthofapointonthespherethat isnearesttothecamera Althoughthisapproachdoesnotprovideanexactocclusiontestitgivesaconservativeestimatethatworkswellinmanycasesandwillneverresultinfalseculling3331HiͲZMapConstructionForsingleͲsamplerenderingonecanusetheHiͲZmapasthemaindepthbufferforrenderingthescene(usingaDepthStencilviewofthefirstmiplevel)InDirect3Dreg101multiͲsampleddepthbufferscanalsobesupportedwithanextrafullͲscreenquadpassbyfirstcomputingthemaximumdepthofeachpixelrsquossubͲsamplesandstoringtheresultinthelowestleveloftheHiͲZmapSubsequentlevelsaregeneratedusingasequenceofreductionpasseswhichrepeatedlyfetchtexelsandcomputetheirmaximumasshown inListing3 BecausescreenͲsized imagestypicallydonotmipwellcaremust be taken when reducing oddͲsizedmip levels In this case the pixels on the oddͲsizedboundaryedgemustfetchadditionaltexelstoensurethattheirdepthvaluesaretakenintoaccountInaddition it isnecessarytouse integercalculations forthetextureaddressarithmeticbecause floatingͲpointerrorcanresultinincorrectaddressingwhenrenderingintothelowermiplevelsEachofthereductionpassesrenders intoonemip leveloftheHiͲZmapresourcewhilesamplingfromthepreviousoneThisisvalidapproachinDirect3Dreg10aslongastheresourceviewusedfortheinputmip level does not overlap the one being used for output (different input and output viewsmust becreatedforeachpass)
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
68|P a g e
struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM
Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control
Chapter 3 March of the Froblins
69|P a g e
betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula
ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)
HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ
ௗ ୲ୟ୬ቀഇమቁ (9)
whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat
ቀௐଶቁ ͳ (10)
wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
70|P a g e
ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD
Chapter 3 March of the Froblins
71|P a g e
float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o
Listing 4 Vertex shader for per-instance occlusion culling
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
72|P a g e
335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands
RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )
Listing 5 Summary of our character management system
C
3 TcsftctmOdttbaipo
FccoadTmaofobt
Chapter 3 Ma
34 Cha
The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea
Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation
Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu
The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon
arch of the Fr
racterAn
nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa
canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in
xample of difgic shader
e and desired(b) User pl
we see the cushrooms (d)
of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence
oblins
nimation
h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto
ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp
ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe
mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes
ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU
redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th
ons that our determines tom the left ious poison ng away fro
eaceful restin
is illustratewherethehober isusedtouse the teeanimationweight annotableperfo
med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour
ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor
Froblins cathe current a(a) Froblin cloud in the
om the hazarng restores t
ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai
dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes
kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res
an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo
e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto
s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr
miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi
These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri
ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic
7
ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou
c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p
ns are controsed on each ed treasure tas a result bout to munts
ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo
73|P a g e
mationsandonbyvertexkthebonesfcharacters
merousdrawnd33)theursystemby
fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and
olled by the characterrsquos to the drop-they scatter
nch on some
ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
74|P a g e
Figure 8 Animation texture layout
Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss
float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering
Bone 1
Matrix Row 0
Matrix Row 1
Matrix Row 2
Bone 0
Matrix Row 0
Matrix Row 1
Matrix Row 2
Key 0 Key1 Key N
RGBA FP16
Chapter 3 March of the Froblins
75|P a g e
void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)
Listing 6 Shader code to fetch interpolate and blend bone animations
AN
7
3
F(st
Rsat(stfs
T
AdvancesinReNTatarchuk(E
76|P a g e
35 Tess
Figure 9 Ou(left) On theshaders and the tessellate
Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder
FroblLowr
Zbrushreghighr
Table 1 Com
ealͲTimeRendeEditor)
sellation
ur system alle right the textures W
ed character
erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof
lincontrolcagesolutionmod
resolutionFro
mparison of
eringin3DGra
andCrow
lows rendersame chara
While using thr on the left
GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste
edel
blinmodel
memory foo
aphicsandGam
wdRende
ing characteacter is rendhe same memwhereas the
cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo
tprint for hig
mesCoursendashS
ering
ers with extrdered withomory footprie low resolut
asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke
Polygons
5160faces
15+Mfaces
gh and low r
IGGRAPH2008
reme details ut the use oint we are ation characte
60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka
resolution m
8
in close-up of tessellatioable to add er has much
RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf
Vertexa
2Ktimes2K16b
Vert
Ind
models for ou
when using on using idehigh level ofcoarser silh
D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption
TotalMemory
ndindexbuffe
itdisplacemen
texbuffer~27
dexBuffer180
ur main char
tessellationentical pixel f details for
houettes
0and4000fied shaderdhardwareirect3Dreg11universally
ssellationasseauthoredormanceA
y
ers100KB
ntmap10MB
70MB
0MB
racter
C
Fc
Chapter 3 Ma
Figure 10 Ccharacter
arch of the Fr
Comparison
oblins
of low resolution modeel (top) and hhigh resoluttion model (
7
(bottom) for
77|P a g e
the Froblin
AN
7
Hvtcodwotddc[
3
Ihho
F(vd
Gt1g
AdvancesinReNTatarchuk(E
78|P a g e
Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08
351 GPU
n this sectiohardwareashardware fixoutlinedinF
Figure 11 A(also referrevertices thusdisplacemen
Goingthrougtakesaninp15X times fgenerationo
CoarseS
ealͲTimeRendeEditor)
essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]
UTessellati
onwewillpsusedinouxedͲfunctionigure11
An overviewed to as the s amplifyingt obtaining
ghtheproceutprimitiveor Xboxreg 3ofgraphicsc
SuperͲPrimMesh
eringin3DGra
provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional
ionPipelin
provideanorsystemWn tessellator
w of the tesseldquocontrol cagg the input mthe final tess
essforasing(whichwe60 or ATI Rcards)Aver
Tessella
aphicsandGam
veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of
ne
overviewofWedesigned
unitavailab
ellation procgerdquo or ldquothe smesh The vesellated and
gleinputpolrefertoasaRadeonreg HDrtexshader
ator
mesCoursendashS
nefits speciis effective
ologyparamowamountorenderingooverallamodisplaceme
owsustoreddvideomemhe memoryrce Table 1f the benef
GPU tesselanAPIforableonrecen
cess We stasuper-primitertex shader
d displaced h
ygonwehaasuperͲprimD 2000Ͳ4000(whichwer
Tessellated Mesh
IGGRAPH2008
fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are
1demonstrafits provided
lationpipeliaGPUtesselntconsumer
art by rendertive meshrdquo)r is used to high resolutio
avethefollomitive)anda0 GPU generefertoasa
h
Di
8
al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell
ineavailablelationpipelirGPUsThe
ring a coars The tessellaevaluate suron mesh see
wingthehaamplifiesiterations ornevaluation
Evaluatesurfacepositions
Addsplacement
active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw
ducingtheoy relevant fy savings foation can b
eoncurreninetakingade tessellation
se low resoator unit genrface position on the righ
ardwaretess(upto411t64X for Di
nshader)is
TessellatedanMes
ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor
widthThisisverallgamefor consoleorourmainbe found in
tconsumerdvantageofnprocess is
lution mesh nerates new ons and add ht
sellatorunittrianglesorrect3Dreg 11invokedfor
dDisplacedsh
d
Chapter 3 March of the Froblins
79|P a g e
eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
80|P a g e
struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS
C
L GTHsc
Fdb Hnawddeae
Chapter 3 Ma
f f f f o o o o o r
Listing 7 Ex
GiventhatwTraditionallyHoweverthspacenormachangingthe
Figure 12 Ddisplaced in be shaded us
Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese
arch of the Fr
Convert po translatin somewhat f
float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor
ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS
return o
xample simp
wearerendey animatedereexistsaalmapsInteactualdisp
Displacementhe directio
sing normal
e intend toordertocommely thatwo generatentandnormantmapmustthe tangen
MDGPUMeserequireme
oblins
osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota
S = mul( mVPS = float4( v = vNormalWS = vTangentW
S = vBinormal
le evaluation
eringourchcharactersconcernwithatcasewelacednorma
nt of the veon of geometԢ
light thedismbineTSNMwearecomthe displa
almapsalsotbe generantͲspace norhMappertontsaremet
tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir
float4( vPovPositionWS S WS lWS
n shader for
aracterwithare rendehregardstoeareessental(asshown
rtex modifietric normal
splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr
e from objectrotating abo
vPositionOS vNormalOS )vTangentOS )vBinormalOS
ositionWS 1 1 )
r rendering i
hdisplacemered and litousingdisplatiallygeneratninFigure12
es the norma displacem
faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour
t space to woout y we can
) + vInstanc ) )
) )
nstanced tes
entmapweusing tang
acementmatinganewt2)
al used for ment amount
angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor
orld space byn simplify th
cePosWS
ssellated cha
emustmakegentͲspaceappingwhentangentfram
rendering
t D The resu
cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa
8
y rotating anhe math
aracters
eanoteabonormal manlightingusimeasweare
P is the oriulting point
mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat
81|P a g e
nd
outlightingps (TSNM)ngtangentͲedisplacing
iginal point Prsquo needs to
heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan
t
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
82|P a g e
353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows
ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ
HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive
353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed
Chapter 3 March of the Froblins
83|P a g e
vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
C
cdmIrtinTa
wisc
Fmtt
Chapter 3 Ma
crowdsThedecision logmethodspro
nthiscontinreferred toaterrainslopen the envirneighboring
Thiscost funanylocation
where isܨ tntuitive toshortestͲpatcomputinga
Figure 2 Emovement dithat the arrothreat
arch of the Fr
globalmodgic and theoducesmoot
nuumͲbasedasa speedeetc)andaronment Thlocationand
nction isthetotheneare
the positivesee how thh from evepropagating
Example of irections Asows near th
oblins
elisonlyanerefore it isthandrealis
dcrowdsimufunction)Tavoidancefahis cost fundisusedtoe
enusedas iestgoalThi
eͲvalued coshe functionry locationgwavefront
dynamic pas the large ge ldquomonster
napproximaaugmented
sticcrowdm
ulationtheThis cost funactor(basednction descevaluatethe
nputtoasospotential(
ԡሺݔሻԡ ൌ
st function could beݔ In this ctfollowingt
ath finding ghost Froblirdquo are pointi
ationtoaccud by local
movemente
environmennction incorponagentdribes the teoptimalpa
olverthatca()iscalcula
ൌ ܨ
evaluated ie constructecontextwethepathof
in game-likin scares awing away d
uratelongͲtecollision avospeciallyina
ntisformulaporatesbotensitylargeravelͲtime tth
alculatestheatedsuchtha
(1)
in the direced by integrcan think oleastresista
ke scenarioway the littledirecting the
ermplanninoidance proareasofden
atedasacoh theachieeͲscaleobstato move fro
etotaltraveatitsatisfies
ction of therating the cof the globance
o The arrowe critters th
characters
5
gwithfullvoblem Togensecongestio
stfunctionvable speedaclesetc)foom one loc
elͲtime (potestheeikona
e gradient ost functional crowdmo
ws visualizeey scamper away from
55|P a g e
visibilityandether theseon
(sometimesd (basedonorlocationscation to a
ential) fromlequation
ሻݔሺ It isn along theovement as
e character away Note a potential
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
56|P a g e
By following thegradientof thegeneratedpotential fieldagentsareguaranteed toalwaysbemovingalongtheshortestpathtotheglobalgoalconsideringthespeedatwhichanagentcantravelbasedonterrainfeaturesobstaclesandagentdensity(congestion)3211GlobalPathfindingCPUImplementationA fast and simpleͲtoͲunderstand computational algorithm to approximate the solution to the eikonalequationistheFastMarchingMethod[TSITSIKLIS95]whichwewillsummarizehereBecausethepotentialisonlyknownforthegoallocationwebeginbysetting ൌ ͲatthegoalcellandaddingthiscelltoalistofKNOWNcellsAllothercellsareaddedtoanUNKNOWNlistwiththeirpotentialsettoλThealgorithmthenproceedsasfollows
1) AllUNKNOWNcellsadjacenttoaKNOWNcellareaddedtoaNEIGHBORlist2) The potential at eachNEIGHBOR cell is calculated based on the potential at the neighboring
KNOWNcellsandthecosttogetfromtheKNOWNcellstotheNEIGHBORcellinquestion3) Update theNEIGHBORcellwith thesmallestpotential found instep2andadd it to the listof
KNOWNcells4) RepeatuntilallcellsareintheKNOWNlist
NotethattheabovealgorithmisidenticaltoDijkstrarsquosmethodThedifferencebetweenDijkstrarsquosandtheFastMarchingMethodisthewaythatthepotentialiscalculatedinstep2SolvingthecontinuouseikonalequationbyusingDijkstrarsquosmethodonadiscretegridwillnotconvergewewillalwaysgetstairͲsteppingartifactsregardlessofthenumberoftimesyourefinethegridstructure
Figure 3 Illustration of neighboring cells used in finite difference approximation
Tsitsiklispresentsa finitedifferenceapproximation to the continuouseikonalequation thateliminatesthestairͲsteppingproblemFirsttheupwinddirectionsareidentifiedastheleastcostlyneighborsinthexandydirections(seealsoFigure3) ௫ ൌ ఢሼௐǡாሽሼ ܥெሽ ௬ ൌ ఢሼேǡௌሽሼ ܥெሽ (2)Thefinitedifferenceapproximationisthencomputedusingthegreatestsolutiontoெinthequadraticequation
ቀఝಾఝಾቁଶቀఝಾఝಾ
ቁଶൌ ͳ (3)
Chapter 3 March of the Froblins
57|P a g e
Inthecasethat௫or௬isundefined(neitherneighboralonganaxisisKNOWN)theneliminatethetermcontaining that axis from the equationOnceெis found for all cells the gradient can be easilycalculatedHowever theFastMarchingMethod isa serialalgorithmandnotamenable toparallelizationandbyextensionnothighlysuitableforefficientGPUcomputation3212GlobalPathfindingGPUImplementationLuckily there exists amethod for solving the eikonal equation in parallel The Fast IterativeMethod[JEONGWHITAKER07A JEONGWHITAKER07B]uses thesameupwind finitedifferenceapproximationdescribedinSection3211butrequiresnoordereddatastructurestomaintainlistssuchasKNOWNUNKOWNetcThe idea is toonlyperformupdates to thepotential functionat thebandof cellswhichareactive Inpracticea listof individualactivecellsdoesnotneedtobemaintainedA listofactivetilesorspatiallycoherentblocksofcells ismaintained Intuitivelythe listofactivetiles is initializedtocontainthetilecontainingthesource(thegoalinourapplication)Wesummarizethealgorithmasfollows
1) Iteratentimesonallcellsintheactivetiles2) CompareeachcellineachactivetiletothepreviouslycomputedpotentialvalueforthatcellIf
thedifferenceiswithinsomesmallthresholdmarkitasconverged3) Foreachactivetileperformareductionontheconvergenceresultstodetermineiftheentire
tileisconverged4) Performoneiterationonalltilesneighboringthetilesdeterminedtohaveconvergedinstep3
toseeifanycellvalueschange5) Update theactive listof tiles to reflectall tiles thatbecame inactivedue toconvergenceor
thatwereidentifiedasbeingreactivatedinstep4The authors reported 4Ͳ6X performance improvements over optimizedCPU implementations for theirtileͲbasedimplementationHoweverforour implementationweareabletofurthersimplifythisalgorithmbecausethecomplexityofourcostfunctiondoesnotvarygreatlyandourcellgridsaresmallrelativetothelargedatasetsusedintheauthorsrsquoworkBecauseourdatasetsaresmall(1282or2562)theconstantoverheadofperformingtestsforconvergenceoutweighsthegainsfromcullingcomputationandimpactsperformancenegativelyInotheraspectsouralgorithm is similar to theabove Inorder toensure thatour solverconvergesweneed tobeable tomakeaconservativeestimateofthenumberofiterationsneededThenumberofiterationsusedinoursolverwasdeterminedempiricallybyexaminingworstͲcasecostfunctioncomplexityBy calculating four eikonal solutions at oncewe are able to achieve 98 ALU utilization on an ATIRadeonTMHD4870withGDDR5memoryThisyieldsveryhighcomputationalthroughputOurGPUbasedsolver computesa2562 solution in20mswhich is faster thanourCPU implementationbya factorofapproximately45TheperformancedatawascollectedonanAMDPhenomtradeX4QuadͲCoreCPUsystem
AN
5
wrA
3SfwetdaOc
Fo3UahaTc
AdvancesinReNTatarchuk(E
58|P a g e
with2GBofregularenginA
3213Co
Solving theefunctionfor
where isequivalenttothecostreladifferentpatagentdensit
Once thescacalculateሺݔ
Figure 4 Lefof the blue m
322 Loc
Unfortunateavoideachohaveanaccuavoidancem
The basic gocollisions wi
ealͲTimeRendeEditor)
fRAM and aneandmem
onstructing
eikonalequtheenviron
the final cooslopeaswatedtodynathingdepenynearagoa
alarcost funሻThegradݔ
eft to Right Cmarker
calNavigat
lysolving totherwithacuratebehavmodelthatre
oal of a locith nearby a
eringin3DGra
anATIRademoryclocks
theCostFu
ation requirmentinthe
ሻݔሺܨ
ost functionwellasanylaamichazardsdingonthealtoprevent
nctionܨሺݔሻdientofሺݔሻ
Cost functio
tionandA
heeikonalecceptablefidiormodelfoesolvesthese
calmodel isagents and
aphicsandGam
eonTM4870HLSLsource
unction
resacost fuFroblinsdem
ሻ ൌ ሺݔሻ
n is the srgestaticobsandsituationFtovercrowd
isconstructሻiscalculate
n potential
Avoidance
equationatdelityisprohorourcharaefineͲgraine
s to providealso to nav
mesCoursendashS
graphics carecodeforo
unction thatmoiscompu
ሻݔሺܦ
staticmovebjectssuchaareweighForexampleingatgoals
ted (Figureedusingcen
gradient of
aresolutionhibitivelyexctersweauedobstacles
e each indivvigate arou
IGGRAPH2008
rdwith512uriterative
tcanbeevautedasfollow
ሻǡݔሺܣ
ement costasbuildings)htsthatcanitmaybed
4) itcanbentraldifferen
f potential T
nhighenouxpensiveforugmentour
vidual agentnd obstacle
8
2MBofGDDeikonalsolv
aluatedatews
(4)
(including tisthedebeadjusteddesiredtoin
esupplied tnces
The goals (so
ugh for largearealͲtimeglobaleikon
twith a veles and agen
DR5 videomverislistedi
achgridcel
terrainmovensityofagedspatiallytoncreasethe
to theeikon
ources) are a
enumbersoapplicationnalsolution
ocity thatwnts towards
memory andinAppendix
l Thecost
ement costntsandisoencouragecostdueto
alsolver to
at the center
ofagentstoInordertowithalocal
will preventits desired
Chapter 3 March of the Froblins
59|P a g e
destination This is typicallyhandledby a continuous cycleof examining thenearby environment andreactingbasedonthediscoveredinformation3221MethodOur local navigation and avoidance model computes agent velocities by examining the movementdirection determined by the global model and the positions and velocities of nearby agents ThisavoidancemodelisbasedontheVelocityObstacleformulation[FIORINISHILLER98vBPS08]ForourmodelwepresenteachagentasadiscEachagentthereforehasapositionavelocityaradiusamaximumspeedandaglobalgoaldirectionprovidedby theglobalsolverWe inferanorientationɅifrombyassumingthattheagentisorientedtowardsInourapplicationallagentshaveasimilarradiusandthereforeisconstantforallagentsAsinmostlocalmodelsupdatingandforrequiresknowledgeoftheandforallagents˒whereisthesetofallagentswithinacertaindistance 3222SpatialQueriesviaNovelGPUBinning Determining the positions and velocities of dynamic local obstacles requires a spatial data structurecontaining all obstacle information in the simulationWe developed a novelmultiͲpass algorithm forsortingagentsintospatialbinsdirectlyonGPUOuralgorithmusesa2Ddepthtexturearrayandasingle2DcolorbuffertoconstructadatastructureforstoringagentsinbinsThedepthtexturearrayservesasourAgentIDArrayAgiven2DtexeladdressinthisarrayservesasabinAsinglebinisa1Darraytexturearray sliceThebinarraygrowsdown through successive texturearray slicesEach sliceof the texturearraycontainsasingleagentID(binelement)TheagentIDsarestoredinbinsinascendingsortedorderbyagent IDThenumberofagentsthatfall intoagivenbinmaybe lessthanthebincapacity(which isdefinedbythenumberofdeptharrayslices)InordertoefficientlyquerytheagentIDsinagivenbinweuseaBinCounterTheBinCounterisa2DcolorbufferthatrecordstheloadoneachbinintheAgentIDArrayTofindallagentsnearaparticularworldspacepositionthepositionistranslatedintoa2DbinaddressAnytranslationfunctionmaybeusedOurworlddomainissquaresoasimpleuniformgridwasusedtomapworld spacepositions tobins Once thebinaddress isknown thebin load is read from theBinCounter Thisgivesusthenumberofagentsthatare inthebinweare interested in FinallytheeachagentrsquosIDinthebinisreadfromtheAgentIDarrayThis data structure must be updated each frame as agents move about the world Updates areperformedusingan iterativealgorithmthatbeginswithallagent IDs inabuffercalledtheworkingsetEach iterationasagentsareplaced intobins theyare removed from theworking set Thealgorithmcontinuestoiterateuntiltheworkingsetisreducedtozero
AN
6
FbWIccpbatgbsdsoFdttlstdpaAdPstt
AdvancesinReNTatarchuk(E
60|P a g e
Figure 5 Agbin These ID
WebeginbyDArrayarecurrent depcontainingapointprimitibymappingagentIDissthatarelessgivenbinonbedrawn inshadersimpldidnotrecesinglebinthonasubsequ
For the secodepthtargetthefirstpasstimetheveressthanoresettingtheirto less thandepthbufferpass Perforavoidinserti
After vertexdepthvaluePointsthatashaderissettheendoftthis iteration
ealͲTimeRendeEditor)
gent positionDs are later
yclearingthclearedto1th buffer allagent IDsiveAseachtheassociattoredasthethanthedenlytheagennto thatbinlyoutputs1iveanyageheagentwituentpass
ond iterationtandtheagessothesecrtexshaderdequaltothedepthvaluefunction s
rwhichresurmingtheldquogngclipkillin
shadingpoandonlyallaremarkedfttooutput2hispassthenalongwith
eringin3DGra
ns are rasterused to retri
eBinCount10Forthend the Bin isboundahpointpasstedagentrsquoswepointrsquosdepepthvaluestwiththelo Sinceweresultinginntswillremththelowes
nof thealgentsareproond iteratiodoessomeaeIDstoredinetosomevaomuch likultsinthepogreaterthannstructionsi
ointsarepaowsnonͲrejforrejection2sothattheeresultingshall theage
aphicsandGam
rized into a ieve agent po
ersto0toifirstiteratioCounter is
as the inputesthroughworldpositipthvalueTtoredintheowestID(cocanonlywthebincou
mainsettotstIDwillget
gorithm theocessedonceononceagaiadditionalwntheprevioualueoutsidee depthͲpeeointwiththenrdquotest inthnourpixels
assed toagectedpointsnaresimplyeBinCountestreamͲoutbents thatha
mesCoursendashS
bin counterositions and
ndicatethatonthetopͲms bound astvertexbuffthevertexsontoabinTheGPUrsquosdedepthbufforrespondingrite a singlenterbeingsheir initialcwrittenand
second sliceagainNoaintakesas iwork itrejecusAgentIDofthevalideling [EVERITelowestIDtevertexshashaderanda
geometry shstobothbediscardedn
erwillbesetbufferwillcavenotyet
IGGRAPH2008
r and an arrad velocities
tthebinsarmostsliceofthe currenferandeacshaderthepaddressasmdepthͲtestunferAsaresgtothepoineagent to aetto1atbinclearedvaluedotheragen
ceof theAgagentswerenputaworkctsthecurreArrayslicedepthrange
TT01]we arthatisgreateaderratherallowstheG
hader Theesenttothenotrasterizetto2atlocacontainallthbeenbinne
8
ay containin
reemptyandftheAgentIt color targhelement ipointrsquosscreementionedanitisconfigsultofallthntwiththelagivenbinnsthatreceeof0 Ifmuntswillbere
gent IDArraeremovedfrkingsetconentpointprPointsaremeThedeptre effectivelerthanthethanthepixPUtoperfo
geometry srasterizeraedandnotsationswhereheagentsthd The stre
ng IDs of ag
dallslicesoDArrayisboget The ws rasterizedenspacepoaboveTheuredtopassheagentsthowestdepthper iteratioivedanagenultipleagentejectedtobe
ay is setasromthewotainingallaimitive if itsmarkedasldquorhunitisstilly implemenpreviouslybxelshaderarmearlyͲzcu
hader testsandtobestrstreamedouepointsarehatwerebineamͲoutbuf
gents in that
oftheAgentoundastheworking setasa singleositionissetnormalizedsfragmentsatmaptoahvalue)willn thepixelntBinsthattsmaptoaeprocessed
the currentrkingsetongents ThissagentID isrejectedrdquobylconfigurednting a dualbinnedIDtoallowsustoulling
thepointrsquosreamedoututThepixelwrittenAtnnedduringfferwillnot
Chapter 3 March of the Froblins
61|P a g e
containagentsthatwerebinnedinthepreviousiterationsincetheywillhavebeenmarkedforrejectionduringthevertexshaderrsquosldquogreaterthanrdquotestandthuswillnothavebeenstreamedoutinthegeometryshader This streamͲout buffer becomes the new working set and is used as input for subsequentiterationsofthealgorithmSubsequentpassesfollowmuchliketheseconditerationofthealgorithmEachtimethedepthtargetissettothenextsliceoftheagentIDarraythepixelshaderissettooutputcurrentiterationnumberandanewworkingsetiscreatedforuseinthenextpassEachiterationresultsinareducedworkingsetThealgorithmcontinuesto iterateuntiltheworkingset isreducedtozero Anoverflowconditionoccurs ifthe iteration count reachesorexceeds theAgent IDArraydepthbefore theworking set is reduced tozeroThiscanoccuriftoomanyagentslandinagivenbinbutinpracticeoverflowcanbepreventedbyusinga largeenoughnumberofbinssothatagentsaresufficientlydistributedtoavoidoverflow AlsothedepthoftheAgentIDArraycanbeincreasedtoaccommodatehigherbinloadsApingͲponging technique isused tomanage theworking setbuffers The ldquopingrdquobuffer contains thecurrentworking setandactsas inputduringone iterationwhile the ldquopongrdquobufferactsas theoutputbufferTherolesofthepingandpongbuffersareswappedaftereachiterationTwo techniques are used to avoid CPUGPU synchronizations thatwould result in rendering pipelinestalls Predicated rendering featureofDirect3Dreg10 isused tocontrol theexecutionofeach iterationIdeallythealgorithmshouldonlycontinuetoiterateaslongasthepreviousiterationresultedinastreamͲoutbufferwithnonͲzero length Unfortunately ifwewere tocontrolexecutionon theCPUby issuingGPUqueriesaftereachpasstodetermine ifthealgorithmhadcompletedwewould introducestalls intherenderingpipelineduetoCPUGPUsynchronizationandthiswoulddegradeperformanceToavoidsynchronizationstallsallthedrawcallsforthemaximumnumberofiterations(correspondingtothemaximumallowablebin load)aremadeupfront WestillwantthealgorithmtoterminateonceallagentshavebeenbinnedsoweusepredicateddrawcallstoterminateuponcompletionThedrawcallsforeach iterationarepredicatedon the condition that theprevious iteration resulted inagentsbeingstreamedoutIfnoagentsarestreamedoutthenweknowtheworkingsethasbeenreducedtozeroandwecanterminateUsingcascadingpredicateddrawcallsinthiswaywillresultintheremainingdrawcallsbeingskipped Thus theGPU takes fullresponsibility for terminating thealgorithmonceall theagentshavebeenbinnedTheDirect3Dreg10DrawAutocallisusedtoissueeachpredicateddrawsincewedonotknowthesizeoftheworkingsetfromiterationtoiterationOur spatialdata structureprovides somebenefitsoverprevious techniques [HARADA07] Queryingourdatastructure isefficientbecausewestorebin loads inaBinCounterthusallowingustoonlyreadthenecessarynumberofelementsfromtheAgentIDArrayandevenearlyͲoutwhenabin isdeterminedtobeempty Additionallyour techniqueprovidesamechanism fordetectingoverflowemploys iterativestreamͲoutreductionoftheworkingsetandgivesexecutioncontroltotheGPUtoavoidpipelinestalls
AN
6
3AEssmtftcTf
wadt
FasTtd
AdvancesinReNTatarchuk(E
62|P a g e
3223Ag
AgentrsquosdirecEachagentesolutionWesuitable nummotionfideltodeterminefitnessfunctto collision icircleisequa
The updatedfunctionresu
ݐ ൌ
whereݓisaagentsݐሺሻdirectionstotimeͲdeltasi
Figure 6 Eaand velocitieshown
Twoexampltwo sets aredirectiongi
ealͲTimeRendeEditor)
gentMove
ctionsneedevaluatesaneusedfivedmber of dirityversuspeethetimetionbasedoisdeterminealtothedisc
d velocity (Eult(Equation
ሻሺݏݏ ൌ
ൌ ఢ
ൌ పෝ
aperͲagentሻreturnstheoevaluatencethelast
ach agent eves of agents
esetsofdire identicalWehaveal
eringin3DGra
mentDirec
tochangeanumberoffdirectionsinections forerformancetocollisionwntheangleedbyevalucradiusroft
Equation 7)n6)andthe
ൌݓݐሺሻ
ሺݏݏݐ ሺݏǡ పෝሺݐݏ
factoraffecteminimumtistheglobsimulationf
valuates a fin its curre
rectionstobin the locasochosent
aphicsandGam
ctionDeterm
asaresultoixeddirectioourapplicaour applicaoverheadfowithagentsrelativetotating a swetheagents
is then casmallesttim
ሺ ȉ
ሻ
పሻ Τݐ ሻ
tingthepreftimetocollisbalnavigatioframe
fixed numberent and adja
beevaluatedl coordinatetoevaluate
mesCoursendashS
mination
ofpathfindinonsrelativeationMoreation was dorcomputininthecurrethedesiredgept circleͲcir
lculated basmetocollisio
ሻǤ ͷ Ǥͷ
ferencetomsionwithallondirection
r of potentia
acent bins T
dforvelocitye frame defonlydirectio
IGGRAPH2008
ngand localtothegoalcanbeuseddeterminedngmoredireentoradjacglobaldirectrcle collision
sed on theoninthatdir
moveinthegagentsindiisthespeݏ
al movemenTwo agents a
yupdatearfined by thonsthatwo
8
lavoidancedirectiondedforincreasempiricallyectionsEachcentbinsEationandthen test inwh
direction wrection
globaldirectirectioneedofagent
t directions and their co
eshownine agent andouldcausea
modelsrsquocometerminedbysedmotionfby evaluathdirectioniachdirectioetimetocolhich the rad
with the larg
(5)
(6)
(7)
tionoravoidisthesetotandݐ
based on thorresponding
Figure6Nod its globalldquorightturn
mputationsytheglobalfidelityTheing desiredsevaluatedn isgivenallisionTimediusofeach
gest fitness
dnearbyfdiscreteistheݐ
he positions g V sets are
otethatthe navigationrdquochange in
Chapter 3 March of the Froblins
63|P a g e
orientation This eliminates the need to arbitratewhich direction an agent should turn based on thevelocitiesofotheragentsandalsoresults in fewercollision tests Inpractice this limitation isnotverynoticeableordistractingIt is importanttonotethatweemployasimplekinodynamicconstrainttorestrictanagentrsquoschange invelocityItisdesiredtothrottlethechangeinorientationinagiventimestepbecauseouragentshaveaphysical limit tohow fast they can change theirorientations Thisprevents suddenbroad changes inorientationindenseagentsituations323AgentStateManagementAsagentnavigatearound theenvironment it is important tomaintain informationabout theircurrentstateThis includesdatasuchascurrentpositionandvelocitycurrentgroupIDcurrentanimationcycle(or action) and current time within that animation cycleWe alsomaintain perͲagent data such asmaximumspeedandgoalachievementdistanceGoalachievementdistanceisarandomvalueisusedtodeterminethedistancefromthegoalatwhichanagenthasreachedthatgoalThis isratherspecifictoourspecificgoal typessuchasmushroomandgoldpatcheswhere thegoalhasaspecificareaand theboundariesarenebulous AgentanimationcycletransitionsareperformedusingdynamicflowcontrolwithintheshadercontrollingagentupdatelogicIfthecurrenttimewithinananimationcycleisgreaterthanthelengthofthecurrentanimation several conditions are checked to determinewhat animation cycle should be part of theagentrsquosstateThesearecurrentanimationcycledistancefromnearestgoalnumberofagentsnearbydistancefromfearinducingobstaclessuchastheghostfroblinandnoxiousgascloudsandcurrentgroupWhile these agent updates will not be very coherent the agent data texture is very small and theperformance impact isslightDespitethedimensionalityof inputtotheanimationtransitionfunction itmaybepossible toprecompute theanimation transition logic into textures (asdone in [MHR07])andeliminateflowcontrol
324PathfindingResultsDiscussion
Ourapproachhasbeenusedtosimulate~65000agentsatinteractiveratesonanATIRadeonTMHD4870while performing intensive rendering tasks such as multiͲmillion triangle scene rendering globalillumination approximation atmospheric scattering and highͲquality cascaded shadow mapping Allresultswere collectedon anAMDPhenomtradeX4QuadͲCoreCPU systemwith2GBofRAM and anATIRadeonTM4870graphics cardwith512MBofGDDR5 videomemoryand standardengineandmemoryclocksThemainbottleneck inourapplication is renderingamassivenumberofagents In thecaseofsimulating65Kagentsweuseasimplifiedagentmodel(acylinder)Simulation(globalandlocal)alonefor65K agents on the above system is 45 fps Simulation of crowd behavior and interaction alongwithrendering for this largenumberof agents is 31 fpsNote that evenwithusing a very simple cylindermodelduetoextremelylargenumberofagentswearerendering98MtrianglesinthelattercaseAllofourtestingresultswerecollectedrenderedatHDresolutionwith4XMSAA
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
64|P a g e
WhilewechosetousethetwocrowdbehaviorsimulationtechniquesforourglobalandlocalnavigationtheyalsocanbeusedseparatelyastheyeachhavetheirownadvantagesanddisadvantagesThecontinuumapproachusedforourglobalsolverworkswell intheFroblinsscenariowheretherearelargenumbersofagentswithonlya fewtypesofgoalsMorecomplexscenarioswithdiversetasksarelikely to be incompatible with this type of approach Another disadvantage of using a continuumapproach is that all agents aremodeled as having global knowledge An agentwillmake navigationchoicesbasedonobstacles thatarenotvisible to itwhichmayalsonotbedesiredWe feel that thecontinuum approach would be excellent for ambient crowds That is large groups of nonͲplayercharacters which are present mostly for scenery but are expected to navigate around a dynamicenvironmentormovingcharactersThelocalmodelalsohasdisadvantagesBylimitinglocalavoidancevelocitiestobeofaclockwisenaturethevariation inagent interaction issomewhat limitedUsingasmalldiscretesetof localdirections fornavigationcanalso lead tooscillationbetween twodirectionscreatingdistractingbehaviorWhile thelocal avoidancemodelwill prevent collisions in very densely packed situations scenarios arisewhereagentscandeadlockandwillbecomestuckThistypicallyhappensatsinksinagentnavigationssuchasatasmallgoalOnceagentsbecomedenselypackedaroundagoalagentsthatreachthegoalwillbeunableto navigate out of the goal area This could be solved by incorporating varying levels of aggressivebehavior into agentmovement that causes agents to push each other out of theway This type ofapproach couldbeaugmentedbya compositeagentapproach [YCP08] to ldquotrailͲblazerdquopaths throughdenselypackedagents
33CharacterLODManagementTheoverarchinggoalofoursystemissimulationandrenderingofmassivecrowdsofcharacterswithhighlevelofdetailThe latestgenerationsofcommodityGPUsdemonstrate incredible increases ingeometryperformanceespeciallywiththe inclusionofGPUtessellationpipeline(Section35)Neverthelessevenwith stateͲofͲtheͲartgraphicshardware renderingmultiple thousandsofcomplexcharacterswithhighpolygonalcountsatinteractiveratesisverytaxingRenderingthousandsofcharacterswithoveramillionofpolygonseachisneitherpracticalnorwiseasinmanycasesthesecharactersmaybeverysmallonthescreen and therefore performance is wasted on the details that go unnoticed For this reason it isessential to use culling and level of detail (LOD) techniques in order tomake this rendering problemtractableCullingandLODmanagementhavetraditionallybeenCPUͲcentrictaskstradingamodestamountofCPUoverheadforamuch largerreduction intheGPUworkload HoweveracommondifficultyariseswhenthepositionaldataisgeneratedbyaGPUͲbasedsimulationandthereforewouldrequireacostlyreadͲbackoperationforCPUͲsidescenemanagementThis isexactlythesituationwersquoveencounteredaswesimulatedandanimatedourcharactersentirelyontheGPUThealternativeistousetheavailablecomputetoperformallcullingandscenemanagementdirectlyontheGPUInourFroblinsdemowesolvethisproblembyemployingDirect3Dreg10geometryshadersina
Chapter 3 March of the Froblins
65|P a g e
novelwaytoperformcharactercullingandLODsortingentirelyontheGPUThisenablesustoperformthese tasksefficiently forGPUͲsimulatedcharactersTheunderlying ideascouldalsobeappliedwithaCPUsimulationinordertooffloadthescenemanagementfromtheCPU
331UsingStreamͲOutOperationsasFilteringOursystem takesadvantageof instancingsupportavailablewithDirect3Dreg10andDirect3Dreg101APIWerenderanarmyofcharactersasvaried instancedcharacterswith individualactionsandanimationscontrolledontheGPUThisnaturallyleadstothekeyideabehindourscenemanagementapproachtheuseofgeometryshadersthatactas filters forasetofcharacter instances A filteringshaderworksbytaking a set of point primitives as inputwhere each point contains the perͲinstance data needed torenderagivencharacter(positionorientationandanimationstate) ThefilteringshaderreͲemitsonlythosepointswhichpassaparticulartestwhilediscardingtherestTheemittedpointsarestreamedintoa bufferwhich can then be reͲbound as instance data and used to render the characters MultiplefilteringpassescanbechainedtogetherbyusingsuccessiveDrawAutocallsanddifferenttestscanbesetupsimplybyusingdifferentshadersInpracticeweuseasharedgeometryshadertoperformtheactualfilteringandperformthedifferentfilteringtestsinvertexshadersAsidefromprovidingmoremodularcodethisapproachcanalsoprovideperformancebenefitsThesourcetothisfilteringgeometryshaderisshowninListing1
struct GSInput XYZ contain the characterrsquos origin in world space W contains a group number which is used to vary character appearance float4 vPositionAndGroup PositionAndGroup XY contain the character orientation (a vector in the XZ plane) Z contains the index of the characterrsquos animation cycle W contains the time along the cycle (see section 35) float4 vDirection DirectionStateAndTime Result of predicate test 1 == emit 0 == do not emit float fResult TestResult struct GSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime [maxvertexcount(1)] void main( point GSInput vert[1] inout PointStreamltGSOutputgt outputStream ) [branch] if( vert[0]fResult == 1 ) GSOutput o ovPositionAndGroup = vert[0]vPositionAndGroup ovDirection = vert[0]vDirection outputStreamAppend( o )
Listing 1 A stream-filtering geometry shader
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
66|P a g e
332ViewͲFrustumCullingIt is straightforward to perform view frustum culling using a filtering geometry shader as describedabove ForviewͲfrustumcullingthevertexshadersimplyperformsan intersectioncheckbetweenthecharacterboundingvolumeandtheviewfrustumusingtheusualalgorithms(forexample[AHH08])Ifthe testpasses then the corresponding character is visible and its instancedata isemitted from thegeometryshaderandstreamedoutOtherwiseitisdiscardedAnexampleofacullingvertexshaderisgiveninListing2
struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirectionStateAndTime DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible TestResult Computes signed distance between a point and a plane vPlane Contains plane coefficients (abcd) where ax + by + cz = d vPoint Point to be tested against the plane float DistanceToPlane( float4 vPlane float3 vPoint ) return dot( float4( vPoint -1 ) vPlane ) Frustum cullling on a sphere Returns 1 if visible 0 otherwise float CullSphere( float4 vPlanes[6] float3 vCenter float fRadius ) for( uint i=0 ilt6 i++ ) entire sphere is outside one of the six planes cull immediately if( DistanceToPlane( vPlanes[i] vCenter ) gt fRadius ) return 0 return 1 float4 vFrustumPlanes[6] view-frustum planes in world space (normals face out) float3 vSphereCenter bounding sphere center relative to character origin float fSphereRadius bounding sphere radius VSOutput VS( VSInput i ) compute bounding sphere center in world space float3 vObjectPosWS = ivPositionAndGroupxyz float3 vSphereCenterWS = vBoundingSphereCenterxyz + vObjectPosWS perform view-frustum test float fVisible = CullSphere( vFrustumPlanes vSphereCenterWS fSphereRadius ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionStateAndTime = ivDirectionStateAndTime ofVisible = fVisible return o
Listing 2 Vertex shader for view-frustum culling
Chapter 3 March of the Froblins
67|P a g e
333OcclusionCullingWe can also perform occlusion culling in this framework to avoid rendering characters which arecompletelyoccludedbymountainsorstructuresBecauseweareperformingourcharactermanagementontheGPUweareabletoperformocclusionculling inanovelwaybytakingadvantageofthedepthinformationthatexists inthehardwareZbuffer Thisapproachrequiresfar lessCPUoverheadthananapproachbasedonpredicatedrenderingorocclusionquerieswhilestillallowingcullingagainstarbitrarydynamicoccluders Ourapproach issimilar inspirittothehierarchicalZtestingthat is implemented inmodernGPUsandwasinspiredbytheworkof[GKM93]whousedahierarchicaldepthimagecombinedwithanoctreetoculloccludedgeometryinbulkAfter rendering allof theoccluders in the scenewe construct ahierarchicaldepth image from the Zbufferwhichwewill refer toasaHiͲZmap TheHiͲZmap isamipͲmappedscreenͲresolution imagewhereeachtexelinmiplevelicontainsthemaximumdepthofallcorrespondingtexelsinmipleveliͲ1InthemostdetailedmipleveleachtexelsimplycontainsthecorrespondingdepthvaluefromtheZbufferThis depth information can be collected during themain rendering pass for the occluding objects aseparatedepthpassisnotrequiredtobuildtheHiͲZmapAfter construction of the HiͲZ map occlusion culling can be performed by examining the depthinformation forpixelswhicharecoveredbyanobjectrsquosboundingsphereandcomparingthemaximumfetcheddepthtotheprojecteddepthofapointonthespherethat isnearesttothecamera Althoughthisapproachdoesnotprovideanexactocclusiontestitgivesaconservativeestimatethatworkswellinmanycasesandwillneverresultinfalseculling3331HiͲZMapConstructionForsingleͲsamplerenderingonecanusetheHiͲZmapasthemaindepthbufferforrenderingthescene(usingaDepthStencilviewofthefirstmiplevel)InDirect3Dreg101multiͲsampleddepthbufferscanalsobesupportedwithanextrafullͲscreenquadpassbyfirstcomputingthemaximumdepthofeachpixelrsquossubͲsamplesandstoringtheresultinthelowestleveloftheHiͲZmapSubsequentlevelsaregeneratedusingasequenceofreductionpasseswhichrepeatedlyfetchtexelsandcomputetheirmaximumasshown inListing3 BecausescreenͲsized imagestypicallydonotmipwellcaremust be taken when reducing oddͲsizedmip levels In this case the pixels on the oddͲsizedboundaryedgemustfetchadditionaltexelstoensurethattheirdepthvaluesaretakenintoaccountInaddition it isnecessarytouse integercalculations forthetextureaddressarithmeticbecause floatingͲpointerrorcanresultinincorrectaddressingwhenrenderingintothelowermiplevelsEachofthereductionpassesrenders intoonemip leveloftheHiͲZmapresourcewhilesamplingfromthepreviousoneThisisvalidapproachinDirect3Dreg10aslongastheresourceviewusedfortheinputmip level does not overlap the one being used for output (different input and output viewsmust becreatedforeachpass)
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
68|P a g e
struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM
Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control
Chapter 3 March of the Froblins
69|P a g e
betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula
ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)
HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ
ௗ ୲ୟ୬ቀഇమቁ (9)
whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat
ቀௐଶቁ ͳ (10)
wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
70|P a g e
ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD
Chapter 3 March of the Froblins
71|P a g e
float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o
Listing 4 Vertex shader for per-instance occlusion culling
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
72|P a g e
335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands
RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )
Listing 5 Summary of our character management system
C
3 TcsftctmOdttbaipo
FccoadTmaofobt
Chapter 3 Ma
34 Cha
The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea
Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation
Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu
The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon
arch of the Fr
racterAn
nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa
canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in
xample of difgic shader
e and desired(b) User pl
we see the cushrooms (d)
of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence
oblins
nimation
h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto
ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp
ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe
mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes
ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU
redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th
ons that our determines tom the left ious poison ng away fro
eaceful restin
is illustratewherethehober isusedtouse the teeanimationweight annotableperfo
med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour
ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor
Froblins cathe current a(a) Froblin cloud in the
om the hazarng restores t
ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai
dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes
kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res
an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo
e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto
s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr
miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi
These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri
ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic
7
ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou
c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p
ns are controsed on each ed treasure tas a result bout to munts
ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo
73|P a g e
mationsandonbyvertexkthebonesfcharacters
merousdrawnd33)theursystemby
fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and
olled by the characterrsquos to the drop-they scatter
nch on some
ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
74|P a g e
Figure 8 Animation texture layout
Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss
float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering
Bone 1
Matrix Row 0
Matrix Row 1
Matrix Row 2
Bone 0
Matrix Row 0
Matrix Row 1
Matrix Row 2
Key 0 Key1 Key N
RGBA FP16
Chapter 3 March of the Froblins
75|P a g e
void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)
Listing 6 Shader code to fetch interpolate and blend bone animations
AN
7
3
F(st
Rsat(stfs
T
AdvancesinReNTatarchuk(E
76|P a g e
35 Tess
Figure 9 Ou(left) On theshaders and the tessellate
Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder
FroblLowr
Zbrushreghighr
Table 1 Com
ealͲTimeRendeEditor)
sellation
ur system alle right the textures W
ed character
erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof
lincontrolcagesolutionmod
resolutionFro
mparison of
eringin3DGra
andCrow
lows rendersame chara
While using thr on the left
GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste
edel
blinmodel
memory foo
aphicsandGam
wdRende
ing characteacter is rendhe same memwhereas the
cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo
tprint for hig
mesCoursendashS
ering
ers with extrdered withomory footprie low resolut
asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke
Polygons
5160faces
15+Mfaces
gh and low r
IGGRAPH2008
reme details ut the use oint we are ation characte
60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka
resolution m
8
in close-up of tessellatioable to add er has much
RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf
Vertexa
2Ktimes2K16b
Vert
Ind
models for ou
when using on using idehigh level ofcoarser silh
D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption
TotalMemory
ndindexbuffe
itdisplacemen
texbuffer~27
dexBuffer180
ur main char
tessellationentical pixel f details for
houettes
0and4000fied shaderdhardwareirect3Dreg11universally
ssellationasseauthoredormanceA
y
ers100KB
ntmap10MB
70MB
0MB
racter
C
Fc
Chapter 3 Ma
Figure 10 Ccharacter
arch of the Fr
Comparison
oblins
of low resolution modeel (top) and hhigh resoluttion model (
7
(bottom) for
77|P a g e
the Froblin
AN
7
Hvtcodwotddc[
3
Ihho
F(vd
Gt1g
AdvancesinReNTatarchuk(E
78|P a g e
Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08
351 GPU
n this sectiohardwareashardware fixoutlinedinF
Figure 11 A(also referrevertices thusdisplacemen
Goingthrougtakesaninp15X times fgenerationo
CoarseS
ealͲTimeRendeEditor)
essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]
UTessellati
onwewillpsusedinouxedͲfunctionigure11
An overviewed to as the s amplifyingt obtaining
ghtheproceutprimitiveor Xboxreg 3ofgraphicsc
SuperͲPrimMesh
eringin3DGra
provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional
ionPipelin
provideanorsystemWn tessellator
w of the tesseldquocontrol cagg the input mthe final tess
essforasing(whichwe60 or ATI Rcards)Aver
Tessella
aphicsandGam
veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of
ne
overviewofWedesigned
unitavailab
ellation procgerdquo or ldquothe smesh The vesellated and
gleinputpolrefertoasaRadeonreg HDrtexshader
ator
mesCoursendashS
nefits speciis effective
ologyparamowamountorenderingooverallamodisplaceme
owsustoreddvideomemhe memoryrce Table 1f the benef
GPU tesselanAPIforableonrecen
cess We stasuper-primitertex shader
d displaced h
ygonwehaasuperͲprimD 2000Ͳ4000(whichwer
Tessellated Mesh
IGGRAPH2008
fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are
1demonstrafits provided
lationpipeliaGPUtesselntconsumer
art by rendertive meshrdquo)r is used to high resolutio
avethefollomitive)anda0 GPU generefertoasa
h
Di
8
al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell
ineavailablelationpipelirGPUsThe
ring a coars The tessellaevaluate suron mesh see
wingthehaamplifiesiterations ornevaluation
Evaluatesurfacepositions
Addsplacement
active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw
ducingtheoy relevant fy savings foation can b
eoncurreninetakingade tessellation
se low resoator unit genrface position on the righ
ardwaretess(upto411t64X for Di
nshader)is
TessellatedanMes
ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor
widthThisisverallgamefor consoleorourmainbe found in
tconsumerdvantageofnprocess is
lution mesh nerates new ons and add ht
sellatorunittrianglesorrect3Dreg 11invokedfor
dDisplacedsh
d
Chapter 3 March of the Froblins
79|P a g e
eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
80|P a g e
struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS
C
L GTHsc
Fdb Hnawddeae
Chapter 3 Ma
f f f f o o o o o r
Listing 7 Ex
GiventhatwTraditionallyHoweverthspacenormachangingthe
Figure 12 Ddisplaced in be shaded us
Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese
arch of the Fr
Convert po translatin somewhat f
float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor
ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS
return o
xample simp
wearerendey animatedereexistsaalmapsInteactualdisp
Displacementhe directio
sing normal
e intend toordertocommely thatwo generatentandnormantmapmustthe tangen
MDGPUMeserequireme
oblins
osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota
S = mul( mVPS = float4( v = vNormalWS = vTangentW
S = vBinormal
le evaluation
eringourchcharactersconcernwithatcasewelacednorma
nt of the veon of geometԢ
light thedismbineTSNMwearecomthe displa
almapsalsotbe generantͲspace norhMappertontsaremet
tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir
float4( vPovPositionWS S WS lWS
n shader for
aracterwithare rendehregardstoeareessental(asshown
rtex modifietric normal
splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr
e from objectrotating abo
vPositionOS vNormalOS )vTangentOS )vBinormalOS
ositionWS 1 1 )
r rendering i
hdisplacemered and litousingdisplatiallygeneratninFigure12
es the norma displacem
faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour
t space to woout y we can
) + vInstanc ) )
) )
nstanced tes
entmapweusing tang
acementmatinganewt2)
al used for ment amount
angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor
orld space byn simplify th
cePosWS
ssellated cha
emustmakegentͲspaceappingwhentangentfram
rendering
t D The resu
cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa
8
y rotating anhe math
aracters
eanoteabonormal manlightingusimeasweare
P is the oriulting point
mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat
81|P a g e
nd
outlightingps (TSNM)ngtangentͲedisplacing
iginal point Prsquo needs to
heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan
t
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
82|P a g e
353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows
ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ
HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive
353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed
Chapter 3 March of the Froblins
83|P a g e
vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
56|P a g e
By following thegradientof thegeneratedpotential fieldagentsareguaranteed toalwaysbemovingalongtheshortestpathtotheglobalgoalconsideringthespeedatwhichanagentcantravelbasedonterrainfeaturesobstaclesandagentdensity(congestion)3211GlobalPathfindingCPUImplementationA fast and simpleͲtoͲunderstand computational algorithm to approximate the solution to the eikonalequationistheFastMarchingMethod[TSITSIKLIS95]whichwewillsummarizehereBecausethepotentialisonlyknownforthegoallocationwebeginbysetting ൌ ͲatthegoalcellandaddingthiscelltoalistofKNOWNcellsAllothercellsareaddedtoanUNKNOWNlistwiththeirpotentialsettoλThealgorithmthenproceedsasfollows
1) AllUNKNOWNcellsadjacenttoaKNOWNcellareaddedtoaNEIGHBORlist2) The potential at eachNEIGHBOR cell is calculated based on the potential at the neighboring
KNOWNcellsandthecosttogetfromtheKNOWNcellstotheNEIGHBORcellinquestion3) Update theNEIGHBORcellwith thesmallestpotential found instep2andadd it to the listof
KNOWNcells4) RepeatuntilallcellsareintheKNOWNlist
NotethattheabovealgorithmisidenticaltoDijkstrarsquosmethodThedifferencebetweenDijkstrarsquosandtheFastMarchingMethodisthewaythatthepotentialiscalculatedinstep2SolvingthecontinuouseikonalequationbyusingDijkstrarsquosmethodonadiscretegridwillnotconvergewewillalwaysgetstairͲsteppingartifactsregardlessofthenumberoftimesyourefinethegridstructure
Figure 3 Illustration of neighboring cells used in finite difference approximation
Tsitsiklispresentsa finitedifferenceapproximation to the continuouseikonalequation thateliminatesthestairͲsteppingproblemFirsttheupwinddirectionsareidentifiedastheleastcostlyneighborsinthexandydirections(seealsoFigure3) ௫ ൌ ఢሼௐǡாሽሼ ܥெሽ ௬ ൌ ఢሼேǡௌሽሼ ܥெሽ (2)Thefinitedifferenceapproximationisthencomputedusingthegreatestsolutiontoெinthequadraticequation
ቀఝಾఝಾቁଶቀఝಾఝಾ
ቁଶൌ ͳ (3)
Chapter 3 March of the Froblins
57|P a g e
Inthecasethat௫or௬isundefined(neitherneighboralonganaxisisKNOWN)theneliminatethetermcontaining that axis from the equationOnceெis found for all cells the gradient can be easilycalculatedHowever theFastMarchingMethod isa serialalgorithmandnotamenable toparallelizationandbyextensionnothighlysuitableforefficientGPUcomputation3212GlobalPathfindingGPUImplementationLuckily there exists amethod for solving the eikonal equation in parallel The Fast IterativeMethod[JEONGWHITAKER07A JEONGWHITAKER07B]uses thesameupwind finitedifferenceapproximationdescribedinSection3211butrequiresnoordereddatastructurestomaintainlistssuchasKNOWNUNKOWNetcThe idea is toonlyperformupdates to thepotential functionat thebandof cellswhichareactive Inpracticea listof individualactivecellsdoesnotneedtobemaintainedA listofactivetilesorspatiallycoherentblocksofcells ismaintained Intuitivelythe listofactivetiles is initializedtocontainthetilecontainingthesource(thegoalinourapplication)Wesummarizethealgorithmasfollows
1) Iteratentimesonallcellsintheactivetiles2) CompareeachcellineachactivetiletothepreviouslycomputedpotentialvalueforthatcellIf
thedifferenceiswithinsomesmallthresholdmarkitasconverged3) Foreachactivetileperformareductionontheconvergenceresultstodetermineiftheentire
tileisconverged4) Performoneiterationonalltilesneighboringthetilesdeterminedtohaveconvergedinstep3
toseeifanycellvalueschange5) Update theactive listof tiles to reflectall tiles thatbecame inactivedue toconvergenceor
thatwereidentifiedasbeingreactivatedinstep4The authors reported 4Ͳ6X performance improvements over optimizedCPU implementations for theirtileͲbasedimplementationHoweverforour implementationweareabletofurthersimplifythisalgorithmbecausethecomplexityofourcostfunctiondoesnotvarygreatlyandourcellgridsaresmallrelativetothelargedatasetsusedintheauthorsrsquoworkBecauseourdatasetsaresmall(1282or2562)theconstantoverheadofperformingtestsforconvergenceoutweighsthegainsfromcullingcomputationandimpactsperformancenegativelyInotheraspectsouralgorithm is similar to theabove Inorder toensure thatour solverconvergesweneed tobeable tomakeaconservativeestimateofthenumberofiterationsneededThenumberofiterationsusedinoursolverwasdeterminedempiricallybyexaminingworstͲcasecostfunctioncomplexityBy calculating four eikonal solutions at oncewe are able to achieve 98 ALU utilization on an ATIRadeonTMHD4870withGDDR5memoryThisyieldsveryhighcomputationalthroughputOurGPUbasedsolver computesa2562 solution in20mswhich is faster thanourCPU implementationbya factorofapproximately45TheperformancedatawascollectedonanAMDPhenomtradeX4QuadͲCoreCPUsystem
AN
5
wrA
3SfwetdaOc
Fo3UahaTc
AdvancesinReNTatarchuk(E
58|P a g e
with2GBofregularenginA
3213Co
Solving theefunctionfor
where isequivalenttothecostreladifferentpatagentdensit
Once thescacalculateሺݔ
Figure 4 Lefof the blue m
322 Loc
Unfortunateavoideachohaveanaccuavoidancem
The basic gocollisions wi
ealͲTimeRendeEditor)
fRAM and aneandmem
onstructing
eikonalequtheenviron
the final cooslopeaswatedtodynathingdepenynearagoa
alarcost funሻThegradݔ
eft to Right Cmarker
calNavigat
lysolving totherwithacuratebehavmodelthatre
oal of a locith nearby a
eringin3DGra
anATIRademoryclocks
theCostFu
ation requirmentinthe
ሻݔሺܨ
ost functionwellasanylaamichazardsdingonthealtoprevent
nctionܨሺݔሻdientofሺݔሻ
Cost functio
tionandA
heeikonalecceptablefidiormodelfoesolvesthese
calmodel isagents and
aphicsandGam
eonTM4870HLSLsource
unction
resacost fuFroblinsdem
ሻ ൌ ሺݔሻ
n is the srgestaticobsandsituationFtovercrowd
isconstructሻiscalculate
n potential
Avoidance
equationatdelityisprohorourcharaefineͲgraine
s to providealso to nav
mesCoursendashS
graphics carecodeforo
unction thatmoiscompu
ሻݔሺܦ
staticmovebjectssuchaareweighForexampleingatgoals
ted (Figureedusingcen
gradient of
aresolutionhibitivelyexctersweauedobstacles
e each indivvigate arou
IGGRAPH2008
rdwith512uriterative
tcanbeevautedasfollow
ሻǡݔሺܣ
ement costasbuildings)htsthatcanitmaybed
4) itcanbentraldifferen
f potential T
nhighenouxpensiveforugmentour
vidual agentnd obstacle
8
2MBofGDDeikonalsolv
aluatedatews
(4)
(including tisthedebeadjusteddesiredtoin
esupplied tnces
The goals (so
ugh for largearealͲtimeglobaleikon
twith a veles and agen
DR5 videomverislistedi
achgridcel
terrainmovensityofagedspatiallytoncreasethe
to theeikon
ources) are a
enumbersoapplicationnalsolution
ocity thatwnts towards
memory andinAppendix
l Thecost
ement costntsandisoencouragecostdueto
alsolver to
at the center
ofagentstoInordertowithalocal
will preventits desired
Chapter 3 March of the Froblins
59|P a g e
destination This is typicallyhandledby a continuous cycleof examining thenearby environment andreactingbasedonthediscoveredinformation3221MethodOur local navigation and avoidance model computes agent velocities by examining the movementdirection determined by the global model and the positions and velocities of nearby agents ThisavoidancemodelisbasedontheVelocityObstacleformulation[FIORINISHILLER98vBPS08]ForourmodelwepresenteachagentasadiscEachagentthereforehasapositionavelocityaradiusamaximumspeedandaglobalgoaldirectionprovidedby theglobalsolverWe inferanorientationɅifrombyassumingthattheagentisorientedtowardsInourapplicationallagentshaveasimilarradiusandthereforeisconstantforallagentsAsinmostlocalmodelsupdatingandforrequiresknowledgeoftheandforallagents˒whereisthesetofallagentswithinacertaindistance 3222SpatialQueriesviaNovelGPUBinning Determining the positions and velocities of dynamic local obstacles requires a spatial data structurecontaining all obstacle information in the simulationWe developed a novelmultiͲpass algorithm forsortingagentsintospatialbinsdirectlyonGPUOuralgorithmusesa2Ddepthtexturearrayandasingle2DcolorbuffertoconstructadatastructureforstoringagentsinbinsThedepthtexturearrayservesasourAgentIDArrayAgiven2DtexeladdressinthisarrayservesasabinAsinglebinisa1Darraytexturearray sliceThebinarraygrowsdown through successive texturearray slicesEach sliceof the texturearraycontainsasingleagentID(binelement)TheagentIDsarestoredinbinsinascendingsortedorderbyagent IDThenumberofagentsthatfall intoagivenbinmaybe lessthanthebincapacity(which isdefinedbythenumberofdeptharrayslices)InordertoefficientlyquerytheagentIDsinagivenbinweuseaBinCounterTheBinCounterisa2DcolorbufferthatrecordstheloadoneachbinintheAgentIDArrayTofindallagentsnearaparticularworldspacepositionthepositionistranslatedintoa2DbinaddressAnytranslationfunctionmaybeusedOurworlddomainissquaresoasimpleuniformgridwasusedtomapworld spacepositions tobins Once thebinaddress isknown thebin load is read from theBinCounter Thisgivesusthenumberofagentsthatare inthebinweare interested in FinallytheeachagentrsquosIDinthebinisreadfromtheAgentIDarrayThis data structure must be updated each frame as agents move about the world Updates areperformedusingan iterativealgorithmthatbeginswithallagent IDs inabuffercalledtheworkingsetEach iterationasagentsareplaced intobins theyare removed from theworking set Thealgorithmcontinuestoiterateuntiltheworkingsetisreducedtozero
AN
6
FbWIccpbatgbsdsoFdttlstdpaAdPstt
AdvancesinReNTatarchuk(E
60|P a g e
Figure 5 Agbin These ID
WebeginbyDArrayarecurrent depcontainingapointprimitibymappingagentIDissthatarelessgivenbinonbedrawn inshadersimpldidnotrecesinglebinthonasubsequ
For the secodepthtargetthefirstpasstimetheveressthanoresettingtheirto less thandepthbufferpass Perforavoidinserti
After vertexdepthvaluePointsthatashaderissettheendoftthis iteration
ealͲTimeRendeEditor)
gent positionDs are later
yclearingthclearedto1th buffer allagent IDsiveAseachtheassociattoredasthethanthedenlytheagennto thatbinlyoutputs1iveanyageheagentwituentpass
ond iterationtandtheagessothesecrtexshaderdequaltothedepthvaluefunction s
rwhichresurmingtheldquogngclipkillin
shadingpoandonlyallaremarkedfttooutput2hispassthenalongwith
eringin3DGra
ns are rasterused to retri
eBinCount10Forthend the Bin isboundahpointpasstedagentrsquoswepointrsquosdepepthvaluestwiththelo Sinceweresultinginntswillremththelowes
nof thealgentsareproond iteratiodoessomeaeIDstoredinetosomevaomuch likultsinthepogreaterthannstructionsi
ointsarepaowsnonͲrejforrejection2sothattheeresultingshall theage
aphicsandGam
rized into a ieve agent po
ersto0toifirstiteratioCounter is
as the inputesthroughworldpositipthvalueTtoredintheowestID(cocanonlywthebincou
mainsettotstIDwillget
gorithm theocessedonceononceagaiadditionalwntheprevioualueoutsidee depthͲpeeointwiththenrdquotest inthnourpixels
assed toagectedpointsnaresimplyeBinCountestreamͲoutbents thatha
mesCoursendashS
bin counterositions and
ndicatethatonthetopͲms bound astvertexbuffthevertexsontoabinTheGPUrsquosdedepthbufforrespondingrite a singlenterbeingsheir initialcwrittenand
second sliceagainNoaintakesas iwork itrejecusAgentIDofthevalideling [EVERITelowestIDtevertexshashaderanda
geometry shstobothbediscardedn
erwillbesetbufferwillcavenotyet
IGGRAPH2008
r and an arrad velocities
tthebinsarmostsliceofthe currenferandeacshaderthepaddressasmdepthͲtestunferAsaresgtothepoineagent to aetto1atbinclearedvaluedotheragen
ceof theAgagentswerenputaworkctsthecurreArrayslicedepthrange
TT01]we arthatisgreateaderratherallowstheG
hader Theesenttothenotrasterizetto2atlocacontainallthbeenbinne
8
ay containin
reemptyandftheAgentIt color targhelement ipointrsquosscreementionedanitisconfigsultofallthntwiththelagivenbinnsthatreceeof0 Ifmuntswillbere
gent IDArraeremovedfrkingsetconentpointprPointsaremeThedeptre effectivelerthanthethanthepixPUtoperfo
geometry srasterizeraedandnotsationswhereheagentsthd The stre
ng IDs of ag
dallslicesoDArrayisboget The ws rasterizedenspacepoaboveTheuredtopassheagentsthowestdepthper iteratioivedanagenultipleagentejectedtobe
ay is setasromthewotainingallaimitive if itsmarkedasldquorhunitisstilly implemenpreviouslybxelshaderarmearlyͲzcu
hader testsandtobestrstreamedouepointsarehatwerebineamͲoutbuf
gents in that
oftheAgentoundastheworking setasa singleositionissetnormalizedsfragmentsatmaptoahvalue)willn thepixelntBinsthattsmaptoaeprocessed
the currentrkingsetongents ThissagentID isrejectedrdquobylconfigurednting a dualbinnedIDtoallowsustoulling
thepointrsquosreamedoututThepixelwrittenAtnnedduringfferwillnot
Chapter 3 March of the Froblins
61|P a g e
containagentsthatwerebinnedinthepreviousiterationsincetheywillhavebeenmarkedforrejectionduringthevertexshaderrsquosldquogreaterthanrdquotestandthuswillnothavebeenstreamedoutinthegeometryshader This streamͲout buffer becomes the new working set and is used as input for subsequentiterationsofthealgorithmSubsequentpassesfollowmuchliketheseconditerationofthealgorithmEachtimethedepthtargetissettothenextsliceoftheagentIDarraythepixelshaderissettooutputcurrentiterationnumberandanewworkingsetiscreatedforuseinthenextpassEachiterationresultsinareducedworkingsetThealgorithmcontinuesto iterateuntiltheworkingset isreducedtozero Anoverflowconditionoccurs ifthe iteration count reachesorexceeds theAgent IDArraydepthbefore theworking set is reduced tozeroThiscanoccuriftoomanyagentslandinagivenbinbutinpracticeoverflowcanbepreventedbyusinga largeenoughnumberofbinssothatagentsaresufficientlydistributedtoavoidoverflow AlsothedepthoftheAgentIDArraycanbeincreasedtoaccommodatehigherbinloadsApingͲponging technique isused tomanage theworking setbuffers The ldquopingrdquobuffer contains thecurrentworking setandactsas inputduringone iterationwhile the ldquopongrdquobufferactsas theoutputbufferTherolesofthepingandpongbuffersareswappedaftereachiterationTwo techniques are used to avoid CPUGPU synchronizations thatwould result in rendering pipelinestalls Predicated rendering featureofDirect3Dreg10 isused tocontrol theexecutionofeach iterationIdeallythealgorithmshouldonlycontinuetoiterateaslongasthepreviousiterationresultedinastreamͲoutbufferwithnonͲzero length Unfortunately ifwewere tocontrolexecutionon theCPUby issuingGPUqueriesaftereachpasstodetermine ifthealgorithmhadcompletedwewould introducestalls intherenderingpipelineduetoCPUGPUsynchronizationandthiswoulddegradeperformanceToavoidsynchronizationstallsallthedrawcallsforthemaximumnumberofiterations(correspondingtothemaximumallowablebin load)aremadeupfront WestillwantthealgorithmtoterminateonceallagentshavebeenbinnedsoweusepredicateddrawcallstoterminateuponcompletionThedrawcallsforeach iterationarepredicatedon the condition that theprevious iteration resulted inagentsbeingstreamedoutIfnoagentsarestreamedoutthenweknowtheworkingsethasbeenreducedtozeroandwecanterminateUsingcascadingpredicateddrawcallsinthiswaywillresultintheremainingdrawcallsbeingskipped Thus theGPU takes fullresponsibility for terminating thealgorithmonceall theagentshavebeenbinnedTheDirect3Dreg10DrawAutocallisusedtoissueeachpredicateddrawsincewedonotknowthesizeoftheworkingsetfromiterationtoiterationOur spatialdata structureprovides somebenefitsoverprevious techniques [HARADA07] Queryingourdatastructure isefficientbecausewestorebin loads inaBinCounterthusallowingustoonlyreadthenecessarynumberofelementsfromtheAgentIDArrayandevenearlyͲoutwhenabin isdeterminedtobeempty Additionallyour techniqueprovidesamechanism fordetectingoverflowemploys iterativestreamͲoutreductionoftheworkingsetandgivesexecutioncontroltotheGPUtoavoidpipelinestalls
AN
6
3AEssmtftcTf
wadt
FasTtd
AdvancesinReNTatarchuk(E
62|P a g e
3223Ag
AgentrsquosdirecEachagentesolutionWesuitable nummotionfideltodeterminefitnessfunctto collision icircleisequa
The updatedfunctionresu
ݐ ൌ
whereݓisaagentsݐሺሻdirectionstotimeͲdeltasi
Figure 6 Eaand velocitieshown
Twoexampltwo sets aredirectiongi
ealͲTimeRendeEditor)
gentMove
ctionsneedevaluatesaneusedfivedmber of dirityversuspeethetimetionbasedoisdeterminealtothedisc
d velocity (Eult(Equation
ሻሺݏݏ ൌ
ൌ ఢ
ൌ పෝ
aperͲagentሻreturnstheoevaluatencethelast
ach agent eves of agents
esetsofdire identicalWehaveal
eringin3DGra
mentDirec
tochangeanumberoffdirectionsinections forerformancetocollisionwntheangleedbyevalucradiusroft
Equation 7)n6)andthe
ൌݓݐሺሻ
ሺݏݏݐ ሺݏǡ పෝሺݐݏ
factoraffecteminimumtistheglobsimulationf
valuates a fin its curre
rectionstobin the locasochosent
aphicsandGam
ctionDeterm
asaresultoixeddirectioourapplicaour applicaoverheadfowithagentsrelativetotating a swetheagents
is then casmallesttim
ሺ ȉ
ሻ
పሻ Τݐ ሻ
tingthepreftimetocollisbalnavigatioframe
fixed numberent and adja
beevaluatedl coordinatetoevaluate
mesCoursendashS
mination
ofpathfindinonsrelativeationMoreation was dorcomputininthecurrethedesiredgept circleͲcir
lculated basmetocollisio
ሻǤ ͷ Ǥͷ
ferencetomsionwithallondirection
r of potentia
acent bins T
dforvelocitye frame defonlydirectio
IGGRAPH2008
ngand localtothegoalcanbeuseddeterminedngmoredireentoradjacglobaldirectrcle collision
sed on theoninthatdir
moveinthegagentsindiisthespeݏ
al movemenTwo agents a
yupdatearfined by thonsthatwo
8
lavoidancedirectiondedforincreasempiricallyectionsEachcentbinsEationandthen test inwh
direction wrection
globaldirectirectioneedofagent
t directions and their co
eshownine agent andouldcausea
modelsrsquocometerminedbysedmotionfby evaluathdirectioniachdirectioetimetocolhich the rad
with the larg
(5)
(6)
(7)
tionoravoidisthesetotandݐ
based on thorresponding
Figure6Nod its globalldquorightturn
mputationsytheglobalfidelityTheing desiredsevaluatedn isgivenallisionTimediusofeach
gest fitness
dnearbyfdiscreteistheݐ
he positions g V sets are
otethatthe navigationrdquochange in
Chapter 3 March of the Froblins
63|P a g e
orientation This eliminates the need to arbitratewhich direction an agent should turn based on thevelocitiesofotheragentsandalsoresults in fewercollision tests Inpractice this limitation isnotverynoticeableordistractingIt is importanttonotethatweemployasimplekinodynamicconstrainttorestrictanagentrsquoschange invelocityItisdesiredtothrottlethechangeinorientationinagiventimestepbecauseouragentshaveaphysical limit tohow fast they can change theirorientations Thisprevents suddenbroad changes inorientationindenseagentsituations323AgentStateManagementAsagentnavigatearound theenvironment it is important tomaintain informationabout theircurrentstateThis includesdatasuchascurrentpositionandvelocitycurrentgroupIDcurrentanimationcycle(or action) and current time within that animation cycleWe alsomaintain perͲagent data such asmaximumspeedandgoalachievementdistanceGoalachievementdistanceisarandomvalueisusedtodeterminethedistancefromthegoalatwhichanagenthasreachedthatgoalThis isratherspecifictoourspecificgoal typessuchasmushroomandgoldpatcheswhere thegoalhasaspecificareaand theboundariesarenebulous AgentanimationcycletransitionsareperformedusingdynamicflowcontrolwithintheshadercontrollingagentupdatelogicIfthecurrenttimewithinananimationcycleisgreaterthanthelengthofthecurrentanimation several conditions are checked to determinewhat animation cycle should be part of theagentrsquosstateThesearecurrentanimationcycledistancefromnearestgoalnumberofagentsnearbydistancefromfearinducingobstaclessuchastheghostfroblinandnoxiousgascloudsandcurrentgroupWhile these agent updates will not be very coherent the agent data texture is very small and theperformance impact isslightDespitethedimensionalityof inputtotheanimationtransitionfunction itmaybepossible toprecompute theanimation transition logic into textures (asdone in [MHR07])andeliminateflowcontrol
324PathfindingResultsDiscussion
Ourapproachhasbeenusedtosimulate~65000agentsatinteractiveratesonanATIRadeonTMHD4870while performing intensive rendering tasks such as multiͲmillion triangle scene rendering globalillumination approximation atmospheric scattering and highͲquality cascaded shadow mapping Allresultswere collectedon anAMDPhenomtradeX4QuadͲCoreCPU systemwith2GBofRAM and anATIRadeonTM4870graphics cardwith512MBofGDDR5 videomemoryand standardengineandmemoryclocksThemainbottleneck inourapplication is renderingamassivenumberofagents In thecaseofsimulating65Kagentsweuseasimplifiedagentmodel(acylinder)Simulation(globalandlocal)alonefor65K agents on the above system is 45 fps Simulation of crowd behavior and interaction alongwithrendering for this largenumberof agents is 31 fpsNote that evenwithusing a very simple cylindermodelduetoextremelylargenumberofagentswearerendering98MtrianglesinthelattercaseAllofourtestingresultswerecollectedrenderedatHDresolutionwith4XMSAA
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
64|P a g e
WhilewechosetousethetwocrowdbehaviorsimulationtechniquesforourglobalandlocalnavigationtheyalsocanbeusedseparatelyastheyeachhavetheirownadvantagesanddisadvantagesThecontinuumapproachusedforourglobalsolverworkswell intheFroblinsscenariowheretherearelargenumbersofagentswithonlya fewtypesofgoalsMorecomplexscenarioswithdiversetasksarelikely to be incompatible with this type of approach Another disadvantage of using a continuumapproach is that all agents aremodeled as having global knowledge An agentwillmake navigationchoicesbasedonobstacles thatarenotvisible to itwhichmayalsonotbedesiredWe feel that thecontinuum approach would be excellent for ambient crowds That is large groups of nonͲplayercharacters which are present mostly for scenery but are expected to navigate around a dynamicenvironmentormovingcharactersThelocalmodelalsohasdisadvantagesBylimitinglocalavoidancevelocitiestobeofaclockwisenaturethevariation inagent interaction issomewhat limitedUsingasmalldiscretesetof localdirections fornavigationcanalso lead tooscillationbetween twodirectionscreatingdistractingbehaviorWhile thelocal avoidancemodelwill prevent collisions in very densely packed situations scenarios arisewhereagentscandeadlockandwillbecomestuckThistypicallyhappensatsinksinagentnavigationssuchasatasmallgoalOnceagentsbecomedenselypackedaroundagoalagentsthatreachthegoalwillbeunableto navigate out of the goal area This could be solved by incorporating varying levels of aggressivebehavior into agentmovement that causes agents to push each other out of theway This type ofapproach couldbeaugmentedbya compositeagentapproach [YCP08] to ldquotrailͲblazerdquopaths throughdenselypackedagents
33CharacterLODManagementTheoverarchinggoalofoursystemissimulationandrenderingofmassivecrowdsofcharacterswithhighlevelofdetailThe latestgenerationsofcommodityGPUsdemonstrate incredible increases ingeometryperformanceespeciallywiththe inclusionofGPUtessellationpipeline(Section35)Neverthelessevenwith stateͲofͲtheͲartgraphicshardware renderingmultiple thousandsofcomplexcharacterswithhighpolygonalcountsatinteractiveratesisverytaxingRenderingthousandsofcharacterswithoveramillionofpolygonseachisneitherpracticalnorwiseasinmanycasesthesecharactersmaybeverysmallonthescreen and therefore performance is wasted on the details that go unnoticed For this reason it isessential to use culling and level of detail (LOD) techniques in order tomake this rendering problemtractableCullingandLODmanagementhavetraditionallybeenCPUͲcentrictaskstradingamodestamountofCPUoverheadforamuch largerreduction intheGPUworkload HoweveracommondifficultyariseswhenthepositionaldataisgeneratedbyaGPUͲbasedsimulationandthereforewouldrequireacostlyreadͲbackoperationforCPUͲsidescenemanagementThis isexactlythesituationwersquoveencounteredaswesimulatedandanimatedourcharactersentirelyontheGPUThealternativeistousetheavailablecomputetoperformallcullingandscenemanagementdirectlyontheGPUInourFroblinsdemowesolvethisproblembyemployingDirect3Dreg10geometryshadersina
Chapter 3 March of the Froblins
65|P a g e
novelwaytoperformcharactercullingandLODsortingentirelyontheGPUThisenablesustoperformthese tasksefficiently forGPUͲsimulatedcharactersTheunderlying ideascouldalsobeappliedwithaCPUsimulationinordertooffloadthescenemanagementfromtheCPU
331UsingStreamͲOutOperationsasFilteringOursystem takesadvantageof instancingsupportavailablewithDirect3Dreg10andDirect3Dreg101APIWerenderanarmyofcharactersasvaried instancedcharacterswith individualactionsandanimationscontrolledontheGPUThisnaturallyleadstothekeyideabehindourscenemanagementapproachtheuseofgeometryshadersthatactas filters forasetofcharacter instances A filteringshaderworksbytaking a set of point primitives as inputwhere each point contains the perͲinstance data needed torenderagivencharacter(positionorientationandanimationstate) ThefilteringshaderreͲemitsonlythosepointswhichpassaparticulartestwhilediscardingtherestTheemittedpointsarestreamedintoa bufferwhich can then be reͲbound as instance data and used to render the characters MultiplefilteringpassescanbechainedtogetherbyusingsuccessiveDrawAutocallsanddifferenttestscanbesetupsimplybyusingdifferentshadersInpracticeweuseasharedgeometryshadertoperformtheactualfilteringandperformthedifferentfilteringtestsinvertexshadersAsidefromprovidingmoremodularcodethisapproachcanalsoprovideperformancebenefitsThesourcetothisfilteringgeometryshaderisshowninListing1
struct GSInput XYZ contain the characterrsquos origin in world space W contains a group number which is used to vary character appearance float4 vPositionAndGroup PositionAndGroup XY contain the character orientation (a vector in the XZ plane) Z contains the index of the characterrsquos animation cycle W contains the time along the cycle (see section 35) float4 vDirection DirectionStateAndTime Result of predicate test 1 == emit 0 == do not emit float fResult TestResult struct GSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime [maxvertexcount(1)] void main( point GSInput vert[1] inout PointStreamltGSOutputgt outputStream ) [branch] if( vert[0]fResult == 1 ) GSOutput o ovPositionAndGroup = vert[0]vPositionAndGroup ovDirection = vert[0]vDirection outputStreamAppend( o )
Listing 1 A stream-filtering geometry shader
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
66|P a g e
332ViewͲFrustumCullingIt is straightforward to perform view frustum culling using a filtering geometry shader as describedabove ForviewͲfrustumcullingthevertexshadersimplyperformsan intersectioncheckbetweenthecharacterboundingvolumeandtheviewfrustumusingtheusualalgorithms(forexample[AHH08])Ifthe testpasses then the corresponding character is visible and its instancedata isemitted from thegeometryshaderandstreamedoutOtherwiseitisdiscardedAnexampleofacullingvertexshaderisgiveninListing2
struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirectionStateAndTime DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible TestResult Computes signed distance between a point and a plane vPlane Contains plane coefficients (abcd) where ax + by + cz = d vPoint Point to be tested against the plane float DistanceToPlane( float4 vPlane float3 vPoint ) return dot( float4( vPoint -1 ) vPlane ) Frustum cullling on a sphere Returns 1 if visible 0 otherwise float CullSphere( float4 vPlanes[6] float3 vCenter float fRadius ) for( uint i=0 ilt6 i++ ) entire sphere is outside one of the six planes cull immediately if( DistanceToPlane( vPlanes[i] vCenter ) gt fRadius ) return 0 return 1 float4 vFrustumPlanes[6] view-frustum planes in world space (normals face out) float3 vSphereCenter bounding sphere center relative to character origin float fSphereRadius bounding sphere radius VSOutput VS( VSInput i ) compute bounding sphere center in world space float3 vObjectPosWS = ivPositionAndGroupxyz float3 vSphereCenterWS = vBoundingSphereCenterxyz + vObjectPosWS perform view-frustum test float fVisible = CullSphere( vFrustumPlanes vSphereCenterWS fSphereRadius ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionStateAndTime = ivDirectionStateAndTime ofVisible = fVisible return o
Listing 2 Vertex shader for view-frustum culling
Chapter 3 March of the Froblins
67|P a g e
333OcclusionCullingWe can also perform occlusion culling in this framework to avoid rendering characters which arecompletelyoccludedbymountainsorstructuresBecauseweareperformingourcharactermanagementontheGPUweareabletoperformocclusionculling inanovelwaybytakingadvantageofthedepthinformationthatexists inthehardwareZbuffer Thisapproachrequiresfar lessCPUoverheadthananapproachbasedonpredicatedrenderingorocclusionquerieswhilestillallowingcullingagainstarbitrarydynamicoccluders Ourapproach issimilar inspirittothehierarchicalZtestingthat is implemented inmodernGPUsandwasinspiredbytheworkof[GKM93]whousedahierarchicaldepthimagecombinedwithanoctreetoculloccludedgeometryinbulkAfter rendering allof theoccluders in the scenewe construct ahierarchicaldepth image from the Zbufferwhichwewill refer toasaHiͲZmap TheHiͲZmap isamipͲmappedscreenͲresolution imagewhereeachtexelinmiplevelicontainsthemaximumdepthofallcorrespondingtexelsinmipleveliͲ1InthemostdetailedmipleveleachtexelsimplycontainsthecorrespondingdepthvaluefromtheZbufferThis depth information can be collected during themain rendering pass for the occluding objects aseparatedepthpassisnotrequiredtobuildtheHiͲZmapAfter construction of the HiͲZ map occlusion culling can be performed by examining the depthinformation forpixelswhicharecoveredbyanobjectrsquosboundingsphereandcomparingthemaximumfetcheddepthtotheprojecteddepthofapointonthespherethat isnearesttothecamera Althoughthisapproachdoesnotprovideanexactocclusiontestitgivesaconservativeestimatethatworkswellinmanycasesandwillneverresultinfalseculling3331HiͲZMapConstructionForsingleͲsamplerenderingonecanusetheHiͲZmapasthemaindepthbufferforrenderingthescene(usingaDepthStencilviewofthefirstmiplevel)InDirect3Dreg101multiͲsampleddepthbufferscanalsobesupportedwithanextrafullͲscreenquadpassbyfirstcomputingthemaximumdepthofeachpixelrsquossubͲsamplesandstoringtheresultinthelowestleveloftheHiͲZmapSubsequentlevelsaregeneratedusingasequenceofreductionpasseswhichrepeatedlyfetchtexelsandcomputetheirmaximumasshown inListing3 BecausescreenͲsized imagestypicallydonotmipwellcaremust be taken when reducing oddͲsizedmip levels In this case the pixels on the oddͲsizedboundaryedgemustfetchadditionaltexelstoensurethattheirdepthvaluesaretakenintoaccountInaddition it isnecessarytouse integercalculations forthetextureaddressarithmeticbecause floatingͲpointerrorcanresultinincorrectaddressingwhenrenderingintothelowermiplevelsEachofthereductionpassesrenders intoonemip leveloftheHiͲZmapresourcewhilesamplingfromthepreviousoneThisisvalidapproachinDirect3Dreg10aslongastheresourceviewusedfortheinputmip level does not overlap the one being used for output (different input and output viewsmust becreatedforeachpass)
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
68|P a g e
struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM
Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control
Chapter 3 March of the Froblins
69|P a g e
betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula
ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)
HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ
ௗ ୲ୟ୬ቀഇమቁ (9)
whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat
ቀௐଶቁ ͳ (10)
wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
70|P a g e
ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD
Chapter 3 March of the Froblins
71|P a g e
float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o
Listing 4 Vertex shader for per-instance occlusion culling
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
72|P a g e
335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands
RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )
Listing 5 Summary of our character management system
C
3 TcsftctmOdttbaipo
FccoadTmaofobt
Chapter 3 Ma
34 Cha
The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea
Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation
Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu
The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon
arch of the Fr
racterAn
nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa
canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in
xample of difgic shader
e and desired(b) User pl
we see the cushrooms (d)
of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence
oblins
nimation
h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto
ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp
ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe
mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes
ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU
redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th
ons that our determines tom the left ious poison ng away fro
eaceful restin
is illustratewherethehober isusedtouse the teeanimationweight annotableperfo
med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour
ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor
Froblins cathe current a(a) Froblin cloud in the
om the hazarng restores t
ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai
dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes
kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res
an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo
e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto
s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr
miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi
These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri
ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic
7
ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou
c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p
ns are controsed on each ed treasure tas a result bout to munts
ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo
73|P a g e
mationsandonbyvertexkthebonesfcharacters
merousdrawnd33)theursystemby
fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and
olled by the characterrsquos to the drop-they scatter
nch on some
ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
74|P a g e
Figure 8 Animation texture layout
Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss
float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering
Bone 1
Matrix Row 0
Matrix Row 1
Matrix Row 2
Bone 0
Matrix Row 0
Matrix Row 1
Matrix Row 2
Key 0 Key1 Key N
RGBA FP16
Chapter 3 March of the Froblins
75|P a g e
void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)
Listing 6 Shader code to fetch interpolate and blend bone animations
AN
7
3
F(st
Rsat(stfs
T
AdvancesinReNTatarchuk(E
76|P a g e
35 Tess
Figure 9 Ou(left) On theshaders and the tessellate
Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder
FroblLowr
Zbrushreghighr
Table 1 Com
ealͲTimeRendeEditor)
sellation
ur system alle right the textures W
ed character
erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof
lincontrolcagesolutionmod
resolutionFro
mparison of
eringin3DGra
andCrow
lows rendersame chara
While using thr on the left
GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste
edel
blinmodel
memory foo
aphicsandGam
wdRende
ing characteacter is rendhe same memwhereas the
cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo
tprint for hig
mesCoursendashS
ering
ers with extrdered withomory footprie low resolut
asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke
Polygons
5160faces
15+Mfaces
gh and low r
IGGRAPH2008
reme details ut the use oint we are ation characte
60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka
resolution m
8
in close-up of tessellatioable to add er has much
RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf
Vertexa
2Ktimes2K16b
Vert
Ind
models for ou
when using on using idehigh level ofcoarser silh
D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption
TotalMemory
ndindexbuffe
itdisplacemen
texbuffer~27
dexBuffer180
ur main char
tessellationentical pixel f details for
houettes
0and4000fied shaderdhardwareirect3Dreg11universally
ssellationasseauthoredormanceA
y
ers100KB
ntmap10MB
70MB
0MB
racter
C
Fc
Chapter 3 Ma
Figure 10 Ccharacter
arch of the Fr
Comparison
oblins
of low resolution modeel (top) and hhigh resoluttion model (
7
(bottom) for
77|P a g e
the Froblin
AN
7
Hvtcodwotddc[
3
Ihho
F(vd
Gt1g
AdvancesinReNTatarchuk(E
78|P a g e
Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08
351 GPU
n this sectiohardwareashardware fixoutlinedinF
Figure 11 A(also referrevertices thusdisplacemen
Goingthrougtakesaninp15X times fgenerationo
CoarseS
ealͲTimeRendeEditor)
essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]
UTessellati
onwewillpsusedinouxedͲfunctionigure11
An overviewed to as the s amplifyingt obtaining
ghtheproceutprimitiveor Xboxreg 3ofgraphicsc
SuperͲPrimMesh
eringin3DGra
provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional
ionPipelin
provideanorsystemWn tessellator
w of the tesseldquocontrol cagg the input mthe final tess
essforasing(whichwe60 or ATI Rcards)Aver
Tessella
aphicsandGam
veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of
ne
overviewofWedesigned
unitavailab
ellation procgerdquo or ldquothe smesh The vesellated and
gleinputpolrefertoasaRadeonreg HDrtexshader
ator
mesCoursendashS
nefits speciis effective
ologyparamowamountorenderingooverallamodisplaceme
owsustoreddvideomemhe memoryrce Table 1f the benef
GPU tesselanAPIforableonrecen
cess We stasuper-primitertex shader
d displaced h
ygonwehaasuperͲprimD 2000Ͳ4000(whichwer
Tessellated Mesh
IGGRAPH2008
fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are
1demonstrafits provided
lationpipeliaGPUtesselntconsumer
art by rendertive meshrdquo)r is used to high resolutio
avethefollomitive)anda0 GPU generefertoasa
h
Di
8
al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell
ineavailablelationpipelirGPUsThe
ring a coars The tessellaevaluate suron mesh see
wingthehaamplifiesiterations ornevaluation
Evaluatesurfacepositions
Addsplacement
active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw
ducingtheoy relevant fy savings foation can b
eoncurreninetakingade tessellation
se low resoator unit genrface position on the righ
ardwaretess(upto411t64X for Di
nshader)is
TessellatedanMes
ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor
widthThisisverallgamefor consoleorourmainbe found in
tconsumerdvantageofnprocess is
lution mesh nerates new ons and add ht
sellatorunittrianglesorrect3Dreg 11invokedfor
dDisplacedsh
d
Chapter 3 March of the Froblins
79|P a g e
eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
80|P a g e
struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS
C
L GTHsc
Fdb Hnawddeae
Chapter 3 Ma
f f f f o o o o o r
Listing 7 Ex
GiventhatwTraditionallyHoweverthspacenormachangingthe
Figure 12 Ddisplaced in be shaded us
Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese
arch of the Fr
Convert po translatin somewhat f
float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor
ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS
return o
xample simp
wearerendey animatedereexistsaalmapsInteactualdisp
Displacementhe directio
sing normal
e intend toordertocommely thatwo generatentandnormantmapmustthe tangen
MDGPUMeserequireme
oblins
osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota
S = mul( mVPS = float4( v = vNormalWS = vTangentW
S = vBinormal
le evaluation
eringourchcharactersconcernwithatcasewelacednorma
nt of the veon of geometԢ
light thedismbineTSNMwearecomthe displa
almapsalsotbe generantͲspace norhMappertontsaremet
tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir
float4( vPovPositionWS S WS lWS
n shader for
aracterwithare rendehregardstoeareessental(asshown
rtex modifietric normal
splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr
e from objectrotating abo
vPositionOS vNormalOS )vTangentOS )vBinormalOS
ositionWS 1 1 )
r rendering i
hdisplacemered and litousingdisplatiallygeneratninFigure12
es the norma displacem
faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour
t space to woout y we can
) + vInstanc ) )
) )
nstanced tes
entmapweusing tang
acementmatinganewt2)
al used for ment amount
angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor
orld space byn simplify th
cePosWS
ssellated cha
emustmakegentͲspaceappingwhentangentfram
rendering
t D The resu
cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa
8
y rotating anhe math
aracters
eanoteabonormal manlightingusimeasweare
P is the oriulting point
mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat
81|P a g e
nd
outlightingps (TSNM)ngtangentͲedisplacing
iginal point Prsquo needs to
heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan
t
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
82|P a g e
353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows
ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ
HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive
353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed
Chapter 3 March of the Froblins
83|P a g e
vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
57|P a g e
Inthecasethat௫or௬isundefined(neitherneighboralonganaxisisKNOWN)theneliminatethetermcontaining that axis from the equationOnceெis found for all cells the gradient can be easilycalculatedHowever theFastMarchingMethod isa serialalgorithmandnotamenable toparallelizationandbyextensionnothighlysuitableforefficientGPUcomputation3212GlobalPathfindingGPUImplementationLuckily there exists amethod for solving the eikonal equation in parallel The Fast IterativeMethod[JEONGWHITAKER07A JEONGWHITAKER07B]uses thesameupwind finitedifferenceapproximationdescribedinSection3211butrequiresnoordereddatastructurestomaintainlistssuchasKNOWNUNKOWNetcThe idea is toonlyperformupdates to thepotential functionat thebandof cellswhichareactive Inpracticea listof individualactivecellsdoesnotneedtobemaintainedA listofactivetilesorspatiallycoherentblocksofcells ismaintained Intuitivelythe listofactivetiles is initializedtocontainthetilecontainingthesource(thegoalinourapplication)Wesummarizethealgorithmasfollows
1) Iteratentimesonallcellsintheactivetiles2) CompareeachcellineachactivetiletothepreviouslycomputedpotentialvalueforthatcellIf
thedifferenceiswithinsomesmallthresholdmarkitasconverged3) Foreachactivetileperformareductionontheconvergenceresultstodetermineiftheentire
tileisconverged4) Performoneiterationonalltilesneighboringthetilesdeterminedtohaveconvergedinstep3
toseeifanycellvalueschange5) Update theactive listof tiles to reflectall tiles thatbecame inactivedue toconvergenceor
thatwereidentifiedasbeingreactivatedinstep4The authors reported 4Ͳ6X performance improvements over optimizedCPU implementations for theirtileͲbasedimplementationHoweverforour implementationweareabletofurthersimplifythisalgorithmbecausethecomplexityofourcostfunctiondoesnotvarygreatlyandourcellgridsaresmallrelativetothelargedatasetsusedintheauthorsrsquoworkBecauseourdatasetsaresmall(1282or2562)theconstantoverheadofperformingtestsforconvergenceoutweighsthegainsfromcullingcomputationandimpactsperformancenegativelyInotheraspectsouralgorithm is similar to theabove Inorder toensure thatour solverconvergesweneed tobeable tomakeaconservativeestimateofthenumberofiterationsneededThenumberofiterationsusedinoursolverwasdeterminedempiricallybyexaminingworstͲcasecostfunctioncomplexityBy calculating four eikonal solutions at oncewe are able to achieve 98 ALU utilization on an ATIRadeonTMHD4870withGDDR5memoryThisyieldsveryhighcomputationalthroughputOurGPUbasedsolver computesa2562 solution in20mswhich is faster thanourCPU implementationbya factorofapproximately45TheperformancedatawascollectedonanAMDPhenomtradeX4QuadͲCoreCPUsystem
AN
5
wrA
3SfwetdaOc
Fo3UahaTc
AdvancesinReNTatarchuk(E
58|P a g e
with2GBofregularenginA
3213Co
Solving theefunctionfor
where isequivalenttothecostreladifferentpatagentdensit
Once thescacalculateሺݔ
Figure 4 Lefof the blue m
322 Loc
Unfortunateavoideachohaveanaccuavoidancem
The basic gocollisions wi
ealͲTimeRendeEditor)
fRAM and aneandmem
onstructing
eikonalequtheenviron
the final cooslopeaswatedtodynathingdepenynearagoa
alarcost funሻThegradݔ
eft to Right Cmarker
calNavigat
lysolving totherwithacuratebehavmodelthatre
oal of a locith nearby a
eringin3DGra
anATIRademoryclocks
theCostFu
ation requirmentinthe
ሻݔሺܨ
ost functionwellasanylaamichazardsdingonthealtoprevent
nctionܨሺݔሻdientofሺݔሻ
Cost functio
tionandA
heeikonalecceptablefidiormodelfoesolvesthese
calmodel isagents and
aphicsandGam
eonTM4870HLSLsource
unction
resacost fuFroblinsdem
ሻ ൌ ሺݔሻ
n is the srgestaticobsandsituationFtovercrowd
isconstructሻiscalculate
n potential
Avoidance
equationatdelityisprohorourcharaefineͲgraine
s to providealso to nav
mesCoursendashS
graphics carecodeforo
unction thatmoiscompu
ሻݔሺܦ
staticmovebjectssuchaareweighForexampleingatgoals
ted (Figureedusingcen
gradient of
aresolutionhibitivelyexctersweauedobstacles
e each indivvigate arou
IGGRAPH2008
rdwith512uriterative
tcanbeevautedasfollow
ሻǡݔሺܣ
ement costasbuildings)htsthatcanitmaybed
4) itcanbentraldifferen
f potential T
nhighenouxpensiveforugmentour
vidual agentnd obstacle
8
2MBofGDDeikonalsolv
aluatedatews
(4)
(including tisthedebeadjusteddesiredtoin
esupplied tnces
The goals (so
ugh for largearealͲtimeglobaleikon
twith a veles and agen
DR5 videomverislistedi
achgridcel
terrainmovensityofagedspatiallytoncreasethe
to theeikon
ources) are a
enumbersoapplicationnalsolution
ocity thatwnts towards
memory andinAppendix
l Thecost
ement costntsandisoencouragecostdueto
alsolver to
at the center
ofagentstoInordertowithalocal
will preventits desired
Chapter 3 March of the Froblins
59|P a g e
destination This is typicallyhandledby a continuous cycleof examining thenearby environment andreactingbasedonthediscoveredinformation3221MethodOur local navigation and avoidance model computes agent velocities by examining the movementdirection determined by the global model and the positions and velocities of nearby agents ThisavoidancemodelisbasedontheVelocityObstacleformulation[FIORINISHILLER98vBPS08]ForourmodelwepresenteachagentasadiscEachagentthereforehasapositionavelocityaradiusamaximumspeedandaglobalgoaldirectionprovidedby theglobalsolverWe inferanorientationɅifrombyassumingthattheagentisorientedtowardsInourapplicationallagentshaveasimilarradiusandthereforeisconstantforallagentsAsinmostlocalmodelsupdatingandforrequiresknowledgeoftheandforallagents˒whereisthesetofallagentswithinacertaindistance 3222SpatialQueriesviaNovelGPUBinning Determining the positions and velocities of dynamic local obstacles requires a spatial data structurecontaining all obstacle information in the simulationWe developed a novelmultiͲpass algorithm forsortingagentsintospatialbinsdirectlyonGPUOuralgorithmusesa2Ddepthtexturearrayandasingle2DcolorbuffertoconstructadatastructureforstoringagentsinbinsThedepthtexturearrayservesasourAgentIDArrayAgiven2DtexeladdressinthisarrayservesasabinAsinglebinisa1Darraytexturearray sliceThebinarraygrowsdown through successive texturearray slicesEach sliceof the texturearraycontainsasingleagentID(binelement)TheagentIDsarestoredinbinsinascendingsortedorderbyagent IDThenumberofagentsthatfall intoagivenbinmaybe lessthanthebincapacity(which isdefinedbythenumberofdeptharrayslices)InordertoefficientlyquerytheagentIDsinagivenbinweuseaBinCounterTheBinCounterisa2DcolorbufferthatrecordstheloadoneachbinintheAgentIDArrayTofindallagentsnearaparticularworldspacepositionthepositionistranslatedintoa2DbinaddressAnytranslationfunctionmaybeusedOurworlddomainissquaresoasimpleuniformgridwasusedtomapworld spacepositions tobins Once thebinaddress isknown thebin load is read from theBinCounter Thisgivesusthenumberofagentsthatare inthebinweare interested in FinallytheeachagentrsquosIDinthebinisreadfromtheAgentIDarrayThis data structure must be updated each frame as agents move about the world Updates areperformedusingan iterativealgorithmthatbeginswithallagent IDs inabuffercalledtheworkingsetEach iterationasagentsareplaced intobins theyare removed from theworking set Thealgorithmcontinuestoiterateuntiltheworkingsetisreducedtozero
AN
6
FbWIccpbatgbsdsoFdttlstdpaAdPstt
AdvancesinReNTatarchuk(E
60|P a g e
Figure 5 Agbin These ID
WebeginbyDArrayarecurrent depcontainingapointprimitibymappingagentIDissthatarelessgivenbinonbedrawn inshadersimpldidnotrecesinglebinthonasubsequ
For the secodepthtargetthefirstpasstimetheveressthanoresettingtheirto less thandepthbufferpass Perforavoidinserti
After vertexdepthvaluePointsthatashaderissettheendoftthis iteration
ealͲTimeRendeEditor)
gent positionDs are later
yclearingthclearedto1th buffer allagent IDsiveAseachtheassociattoredasthethanthedenlytheagennto thatbinlyoutputs1iveanyageheagentwituentpass
ond iterationtandtheagessothesecrtexshaderdequaltothedepthvaluefunction s
rwhichresurmingtheldquogngclipkillin
shadingpoandonlyallaremarkedfttooutput2hispassthenalongwith
eringin3DGra
ns are rasterused to retri
eBinCount10Forthend the Bin isboundahpointpasstedagentrsquoswepointrsquosdepepthvaluestwiththelo Sinceweresultinginntswillremththelowes
nof thealgentsareproond iteratiodoessomeaeIDstoredinetosomevaomuch likultsinthepogreaterthannstructionsi
ointsarepaowsnonͲrejforrejection2sothattheeresultingshall theage
aphicsandGam
rized into a ieve agent po
ersto0toifirstiteratioCounter is
as the inputesthroughworldpositipthvalueTtoredintheowestID(cocanonlywthebincou
mainsettotstIDwillget
gorithm theocessedonceononceagaiadditionalwntheprevioualueoutsidee depthͲpeeointwiththenrdquotest inthnourpixels
assed toagectedpointsnaresimplyeBinCountestreamͲoutbents thatha
mesCoursendashS
bin counterositions and
ndicatethatonthetopͲms bound astvertexbuffthevertexsontoabinTheGPUrsquosdedepthbufforrespondingrite a singlenterbeingsheir initialcwrittenand
second sliceagainNoaintakesas iwork itrejecusAgentIDofthevalideling [EVERITelowestIDtevertexshashaderanda
geometry shstobothbediscardedn
erwillbesetbufferwillcavenotyet
IGGRAPH2008
r and an arrad velocities
tthebinsarmostsliceofthe currenferandeacshaderthepaddressasmdepthͲtestunferAsaresgtothepoineagent to aetto1atbinclearedvaluedotheragen
ceof theAgagentswerenputaworkctsthecurreArrayslicedepthrange
TT01]we arthatisgreateaderratherallowstheG
hader Theesenttothenotrasterizetto2atlocacontainallthbeenbinne
8
ay containin
reemptyandftheAgentIt color targhelement ipointrsquosscreementionedanitisconfigsultofallthntwiththelagivenbinnsthatreceeof0 Ifmuntswillbere
gent IDArraeremovedfrkingsetconentpointprPointsaremeThedeptre effectivelerthanthethanthepixPUtoperfo
geometry srasterizeraedandnotsationswhereheagentsthd The stre
ng IDs of ag
dallslicesoDArrayisboget The ws rasterizedenspacepoaboveTheuredtopassheagentsthowestdepthper iteratioivedanagenultipleagentejectedtobe
ay is setasromthewotainingallaimitive if itsmarkedasldquorhunitisstilly implemenpreviouslybxelshaderarmearlyͲzcu
hader testsandtobestrstreamedouepointsarehatwerebineamͲoutbuf
gents in that
oftheAgentoundastheworking setasa singleositionissetnormalizedsfragmentsatmaptoahvalue)willn thepixelntBinsthattsmaptoaeprocessed
the currentrkingsetongents ThissagentID isrejectedrdquobylconfigurednting a dualbinnedIDtoallowsustoulling
thepointrsquosreamedoututThepixelwrittenAtnnedduringfferwillnot
Chapter 3 March of the Froblins
61|P a g e
containagentsthatwerebinnedinthepreviousiterationsincetheywillhavebeenmarkedforrejectionduringthevertexshaderrsquosldquogreaterthanrdquotestandthuswillnothavebeenstreamedoutinthegeometryshader This streamͲout buffer becomes the new working set and is used as input for subsequentiterationsofthealgorithmSubsequentpassesfollowmuchliketheseconditerationofthealgorithmEachtimethedepthtargetissettothenextsliceoftheagentIDarraythepixelshaderissettooutputcurrentiterationnumberandanewworkingsetiscreatedforuseinthenextpassEachiterationresultsinareducedworkingsetThealgorithmcontinuesto iterateuntiltheworkingset isreducedtozero Anoverflowconditionoccurs ifthe iteration count reachesorexceeds theAgent IDArraydepthbefore theworking set is reduced tozeroThiscanoccuriftoomanyagentslandinagivenbinbutinpracticeoverflowcanbepreventedbyusinga largeenoughnumberofbinssothatagentsaresufficientlydistributedtoavoidoverflow AlsothedepthoftheAgentIDArraycanbeincreasedtoaccommodatehigherbinloadsApingͲponging technique isused tomanage theworking setbuffers The ldquopingrdquobuffer contains thecurrentworking setandactsas inputduringone iterationwhile the ldquopongrdquobufferactsas theoutputbufferTherolesofthepingandpongbuffersareswappedaftereachiterationTwo techniques are used to avoid CPUGPU synchronizations thatwould result in rendering pipelinestalls Predicated rendering featureofDirect3Dreg10 isused tocontrol theexecutionofeach iterationIdeallythealgorithmshouldonlycontinuetoiterateaslongasthepreviousiterationresultedinastreamͲoutbufferwithnonͲzero length Unfortunately ifwewere tocontrolexecutionon theCPUby issuingGPUqueriesaftereachpasstodetermine ifthealgorithmhadcompletedwewould introducestalls intherenderingpipelineduetoCPUGPUsynchronizationandthiswoulddegradeperformanceToavoidsynchronizationstallsallthedrawcallsforthemaximumnumberofiterations(correspondingtothemaximumallowablebin load)aremadeupfront WestillwantthealgorithmtoterminateonceallagentshavebeenbinnedsoweusepredicateddrawcallstoterminateuponcompletionThedrawcallsforeach iterationarepredicatedon the condition that theprevious iteration resulted inagentsbeingstreamedoutIfnoagentsarestreamedoutthenweknowtheworkingsethasbeenreducedtozeroandwecanterminateUsingcascadingpredicateddrawcallsinthiswaywillresultintheremainingdrawcallsbeingskipped Thus theGPU takes fullresponsibility for terminating thealgorithmonceall theagentshavebeenbinnedTheDirect3Dreg10DrawAutocallisusedtoissueeachpredicateddrawsincewedonotknowthesizeoftheworkingsetfromiterationtoiterationOur spatialdata structureprovides somebenefitsoverprevious techniques [HARADA07] Queryingourdatastructure isefficientbecausewestorebin loads inaBinCounterthusallowingustoonlyreadthenecessarynumberofelementsfromtheAgentIDArrayandevenearlyͲoutwhenabin isdeterminedtobeempty Additionallyour techniqueprovidesamechanism fordetectingoverflowemploys iterativestreamͲoutreductionoftheworkingsetandgivesexecutioncontroltotheGPUtoavoidpipelinestalls
AN
6
3AEssmtftcTf
wadt
FasTtd
AdvancesinReNTatarchuk(E
62|P a g e
3223Ag
AgentrsquosdirecEachagentesolutionWesuitable nummotionfideltodeterminefitnessfunctto collision icircleisequa
The updatedfunctionresu
ݐ ൌ
whereݓisaagentsݐሺሻdirectionstotimeͲdeltasi
Figure 6 Eaand velocitieshown
Twoexampltwo sets aredirectiongi
ealͲTimeRendeEditor)
gentMove
ctionsneedevaluatesaneusedfivedmber of dirityversuspeethetimetionbasedoisdeterminealtothedisc
d velocity (Eult(Equation
ሻሺݏݏ ൌ
ൌ ఢ
ൌ పෝ
aperͲagentሻreturnstheoevaluatencethelast
ach agent eves of agents
esetsofdire identicalWehaveal
eringin3DGra
mentDirec
tochangeanumberoffdirectionsinections forerformancetocollisionwntheangleedbyevalucradiusroft
Equation 7)n6)andthe
ൌݓݐሺሻ
ሺݏݏݐ ሺݏǡ పෝሺݐݏ
factoraffecteminimumtistheglobsimulationf
valuates a fin its curre
rectionstobin the locasochosent
aphicsandGam
ctionDeterm
asaresultoixeddirectioourapplicaour applicaoverheadfowithagentsrelativetotating a swetheagents
is then casmallesttim
ሺ ȉ
ሻ
పሻ Τݐ ሻ
tingthepreftimetocollisbalnavigatioframe
fixed numberent and adja
beevaluatedl coordinatetoevaluate
mesCoursendashS
mination
ofpathfindinonsrelativeationMoreation was dorcomputininthecurrethedesiredgept circleͲcir
lculated basmetocollisio
ሻǤ ͷ Ǥͷ
ferencetomsionwithallondirection
r of potentia
acent bins T
dforvelocitye frame defonlydirectio
IGGRAPH2008
ngand localtothegoalcanbeuseddeterminedngmoredireentoradjacglobaldirectrcle collision
sed on theoninthatdir
moveinthegagentsindiisthespeݏ
al movemenTwo agents a
yupdatearfined by thonsthatwo
8
lavoidancedirectiondedforincreasempiricallyectionsEachcentbinsEationandthen test inwh
direction wrection
globaldirectirectioneedofagent
t directions and their co
eshownine agent andouldcausea
modelsrsquocometerminedbysedmotionfby evaluathdirectioniachdirectioetimetocolhich the rad
with the larg
(5)
(6)
(7)
tionoravoidisthesetotandݐ
based on thorresponding
Figure6Nod its globalldquorightturn
mputationsytheglobalfidelityTheing desiredsevaluatedn isgivenallisionTimediusofeach
gest fitness
dnearbyfdiscreteistheݐ
he positions g V sets are
otethatthe navigationrdquochange in
Chapter 3 March of the Froblins
63|P a g e
orientation This eliminates the need to arbitratewhich direction an agent should turn based on thevelocitiesofotheragentsandalsoresults in fewercollision tests Inpractice this limitation isnotverynoticeableordistractingIt is importanttonotethatweemployasimplekinodynamicconstrainttorestrictanagentrsquoschange invelocityItisdesiredtothrottlethechangeinorientationinagiventimestepbecauseouragentshaveaphysical limit tohow fast they can change theirorientations Thisprevents suddenbroad changes inorientationindenseagentsituations323AgentStateManagementAsagentnavigatearound theenvironment it is important tomaintain informationabout theircurrentstateThis includesdatasuchascurrentpositionandvelocitycurrentgroupIDcurrentanimationcycle(or action) and current time within that animation cycleWe alsomaintain perͲagent data such asmaximumspeedandgoalachievementdistanceGoalachievementdistanceisarandomvalueisusedtodeterminethedistancefromthegoalatwhichanagenthasreachedthatgoalThis isratherspecifictoourspecificgoal typessuchasmushroomandgoldpatcheswhere thegoalhasaspecificareaand theboundariesarenebulous AgentanimationcycletransitionsareperformedusingdynamicflowcontrolwithintheshadercontrollingagentupdatelogicIfthecurrenttimewithinananimationcycleisgreaterthanthelengthofthecurrentanimation several conditions are checked to determinewhat animation cycle should be part of theagentrsquosstateThesearecurrentanimationcycledistancefromnearestgoalnumberofagentsnearbydistancefromfearinducingobstaclessuchastheghostfroblinandnoxiousgascloudsandcurrentgroupWhile these agent updates will not be very coherent the agent data texture is very small and theperformance impact isslightDespitethedimensionalityof inputtotheanimationtransitionfunction itmaybepossible toprecompute theanimation transition logic into textures (asdone in [MHR07])andeliminateflowcontrol
324PathfindingResultsDiscussion
Ourapproachhasbeenusedtosimulate~65000agentsatinteractiveratesonanATIRadeonTMHD4870while performing intensive rendering tasks such as multiͲmillion triangle scene rendering globalillumination approximation atmospheric scattering and highͲquality cascaded shadow mapping Allresultswere collectedon anAMDPhenomtradeX4QuadͲCoreCPU systemwith2GBofRAM and anATIRadeonTM4870graphics cardwith512MBofGDDR5 videomemoryand standardengineandmemoryclocksThemainbottleneck inourapplication is renderingamassivenumberofagents In thecaseofsimulating65Kagentsweuseasimplifiedagentmodel(acylinder)Simulation(globalandlocal)alonefor65K agents on the above system is 45 fps Simulation of crowd behavior and interaction alongwithrendering for this largenumberof agents is 31 fpsNote that evenwithusing a very simple cylindermodelduetoextremelylargenumberofagentswearerendering98MtrianglesinthelattercaseAllofourtestingresultswerecollectedrenderedatHDresolutionwith4XMSAA
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
64|P a g e
WhilewechosetousethetwocrowdbehaviorsimulationtechniquesforourglobalandlocalnavigationtheyalsocanbeusedseparatelyastheyeachhavetheirownadvantagesanddisadvantagesThecontinuumapproachusedforourglobalsolverworkswell intheFroblinsscenariowheretherearelargenumbersofagentswithonlya fewtypesofgoalsMorecomplexscenarioswithdiversetasksarelikely to be incompatible with this type of approach Another disadvantage of using a continuumapproach is that all agents aremodeled as having global knowledge An agentwillmake navigationchoicesbasedonobstacles thatarenotvisible to itwhichmayalsonotbedesiredWe feel that thecontinuum approach would be excellent for ambient crowds That is large groups of nonͲplayercharacters which are present mostly for scenery but are expected to navigate around a dynamicenvironmentormovingcharactersThelocalmodelalsohasdisadvantagesBylimitinglocalavoidancevelocitiestobeofaclockwisenaturethevariation inagent interaction issomewhat limitedUsingasmalldiscretesetof localdirections fornavigationcanalso lead tooscillationbetween twodirectionscreatingdistractingbehaviorWhile thelocal avoidancemodelwill prevent collisions in very densely packed situations scenarios arisewhereagentscandeadlockandwillbecomestuckThistypicallyhappensatsinksinagentnavigationssuchasatasmallgoalOnceagentsbecomedenselypackedaroundagoalagentsthatreachthegoalwillbeunableto navigate out of the goal area This could be solved by incorporating varying levels of aggressivebehavior into agentmovement that causes agents to push each other out of theway This type ofapproach couldbeaugmentedbya compositeagentapproach [YCP08] to ldquotrailͲblazerdquopaths throughdenselypackedagents
33CharacterLODManagementTheoverarchinggoalofoursystemissimulationandrenderingofmassivecrowdsofcharacterswithhighlevelofdetailThe latestgenerationsofcommodityGPUsdemonstrate incredible increases ingeometryperformanceespeciallywiththe inclusionofGPUtessellationpipeline(Section35)Neverthelessevenwith stateͲofͲtheͲartgraphicshardware renderingmultiple thousandsofcomplexcharacterswithhighpolygonalcountsatinteractiveratesisverytaxingRenderingthousandsofcharacterswithoveramillionofpolygonseachisneitherpracticalnorwiseasinmanycasesthesecharactersmaybeverysmallonthescreen and therefore performance is wasted on the details that go unnoticed For this reason it isessential to use culling and level of detail (LOD) techniques in order tomake this rendering problemtractableCullingandLODmanagementhavetraditionallybeenCPUͲcentrictaskstradingamodestamountofCPUoverheadforamuch largerreduction intheGPUworkload HoweveracommondifficultyariseswhenthepositionaldataisgeneratedbyaGPUͲbasedsimulationandthereforewouldrequireacostlyreadͲbackoperationforCPUͲsidescenemanagementThis isexactlythesituationwersquoveencounteredaswesimulatedandanimatedourcharactersentirelyontheGPUThealternativeistousetheavailablecomputetoperformallcullingandscenemanagementdirectlyontheGPUInourFroblinsdemowesolvethisproblembyemployingDirect3Dreg10geometryshadersina
Chapter 3 March of the Froblins
65|P a g e
novelwaytoperformcharactercullingandLODsortingentirelyontheGPUThisenablesustoperformthese tasksefficiently forGPUͲsimulatedcharactersTheunderlying ideascouldalsobeappliedwithaCPUsimulationinordertooffloadthescenemanagementfromtheCPU
331UsingStreamͲOutOperationsasFilteringOursystem takesadvantageof instancingsupportavailablewithDirect3Dreg10andDirect3Dreg101APIWerenderanarmyofcharactersasvaried instancedcharacterswith individualactionsandanimationscontrolledontheGPUThisnaturallyleadstothekeyideabehindourscenemanagementapproachtheuseofgeometryshadersthatactas filters forasetofcharacter instances A filteringshaderworksbytaking a set of point primitives as inputwhere each point contains the perͲinstance data needed torenderagivencharacter(positionorientationandanimationstate) ThefilteringshaderreͲemitsonlythosepointswhichpassaparticulartestwhilediscardingtherestTheemittedpointsarestreamedintoa bufferwhich can then be reͲbound as instance data and used to render the characters MultiplefilteringpassescanbechainedtogetherbyusingsuccessiveDrawAutocallsanddifferenttestscanbesetupsimplybyusingdifferentshadersInpracticeweuseasharedgeometryshadertoperformtheactualfilteringandperformthedifferentfilteringtestsinvertexshadersAsidefromprovidingmoremodularcodethisapproachcanalsoprovideperformancebenefitsThesourcetothisfilteringgeometryshaderisshowninListing1
struct GSInput XYZ contain the characterrsquos origin in world space W contains a group number which is used to vary character appearance float4 vPositionAndGroup PositionAndGroup XY contain the character orientation (a vector in the XZ plane) Z contains the index of the characterrsquos animation cycle W contains the time along the cycle (see section 35) float4 vDirection DirectionStateAndTime Result of predicate test 1 == emit 0 == do not emit float fResult TestResult struct GSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime [maxvertexcount(1)] void main( point GSInput vert[1] inout PointStreamltGSOutputgt outputStream ) [branch] if( vert[0]fResult == 1 ) GSOutput o ovPositionAndGroup = vert[0]vPositionAndGroup ovDirection = vert[0]vDirection outputStreamAppend( o )
Listing 1 A stream-filtering geometry shader
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
66|P a g e
332ViewͲFrustumCullingIt is straightforward to perform view frustum culling using a filtering geometry shader as describedabove ForviewͲfrustumcullingthevertexshadersimplyperformsan intersectioncheckbetweenthecharacterboundingvolumeandtheviewfrustumusingtheusualalgorithms(forexample[AHH08])Ifthe testpasses then the corresponding character is visible and its instancedata isemitted from thegeometryshaderandstreamedoutOtherwiseitisdiscardedAnexampleofacullingvertexshaderisgiveninListing2
struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirectionStateAndTime DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible TestResult Computes signed distance between a point and a plane vPlane Contains plane coefficients (abcd) where ax + by + cz = d vPoint Point to be tested against the plane float DistanceToPlane( float4 vPlane float3 vPoint ) return dot( float4( vPoint -1 ) vPlane ) Frustum cullling on a sphere Returns 1 if visible 0 otherwise float CullSphere( float4 vPlanes[6] float3 vCenter float fRadius ) for( uint i=0 ilt6 i++ ) entire sphere is outside one of the six planes cull immediately if( DistanceToPlane( vPlanes[i] vCenter ) gt fRadius ) return 0 return 1 float4 vFrustumPlanes[6] view-frustum planes in world space (normals face out) float3 vSphereCenter bounding sphere center relative to character origin float fSphereRadius bounding sphere radius VSOutput VS( VSInput i ) compute bounding sphere center in world space float3 vObjectPosWS = ivPositionAndGroupxyz float3 vSphereCenterWS = vBoundingSphereCenterxyz + vObjectPosWS perform view-frustum test float fVisible = CullSphere( vFrustumPlanes vSphereCenterWS fSphereRadius ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionStateAndTime = ivDirectionStateAndTime ofVisible = fVisible return o
Listing 2 Vertex shader for view-frustum culling
Chapter 3 March of the Froblins
67|P a g e
333OcclusionCullingWe can also perform occlusion culling in this framework to avoid rendering characters which arecompletelyoccludedbymountainsorstructuresBecauseweareperformingourcharactermanagementontheGPUweareabletoperformocclusionculling inanovelwaybytakingadvantageofthedepthinformationthatexists inthehardwareZbuffer Thisapproachrequiresfar lessCPUoverheadthananapproachbasedonpredicatedrenderingorocclusionquerieswhilestillallowingcullingagainstarbitrarydynamicoccluders Ourapproach issimilar inspirittothehierarchicalZtestingthat is implemented inmodernGPUsandwasinspiredbytheworkof[GKM93]whousedahierarchicaldepthimagecombinedwithanoctreetoculloccludedgeometryinbulkAfter rendering allof theoccluders in the scenewe construct ahierarchicaldepth image from the Zbufferwhichwewill refer toasaHiͲZmap TheHiͲZmap isamipͲmappedscreenͲresolution imagewhereeachtexelinmiplevelicontainsthemaximumdepthofallcorrespondingtexelsinmipleveliͲ1InthemostdetailedmipleveleachtexelsimplycontainsthecorrespondingdepthvaluefromtheZbufferThis depth information can be collected during themain rendering pass for the occluding objects aseparatedepthpassisnotrequiredtobuildtheHiͲZmapAfter construction of the HiͲZ map occlusion culling can be performed by examining the depthinformation forpixelswhicharecoveredbyanobjectrsquosboundingsphereandcomparingthemaximumfetcheddepthtotheprojecteddepthofapointonthespherethat isnearesttothecamera Althoughthisapproachdoesnotprovideanexactocclusiontestitgivesaconservativeestimatethatworkswellinmanycasesandwillneverresultinfalseculling3331HiͲZMapConstructionForsingleͲsamplerenderingonecanusetheHiͲZmapasthemaindepthbufferforrenderingthescene(usingaDepthStencilviewofthefirstmiplevel)InDirect3Dreg101multiͲsampleddepthbufferscanalsobesupportedwithanextrafullͲscreenquadpassbyfirstcomputingthemaximumdepthofeachpixelrsquossubͲsamplesandstoringtheresultinthelowestleveloftheHiͲZmapSubsequentlevelsaregeneratedusingasequenceofreductionpasseswhichrepeatedlyfetchtexelsandcomputetheirmaximumasshown inListing3 BecausescreenͲsized imagestypicallydonotmipwellcaremust be taken when reducing oddͲsizedmip levels In this case the pixels on the oddͲsizedboundaryedgemustfetchadditionaltexelstoensurethattheirdepthvaluesaretakenintoaccountInaddition it isnecessarytouse integercalculations forthetextureaddressarithmeticbecause floatingͲpointerrorcanresultinincorrectaddressingwhenrenderingintothelowermiplevelsEachofthereductionpassesrenders intoonemip leveloftheHiͲZmapresourcewhilesamplingfromthepreviousoneThisisvalidapproachinDirect3Dreg10aslongastheresourceviewusedfortheinputmip level does not overlap the one being used for output (different input and output viewsmust becreatedforeachpass)
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
68|P a g e
struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM
Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control
Chapter 3 March of the Froblins
69|P a g e
betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula
ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)
HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ
ௗ ୲ୟ୬ቀഇమቁ (9)
whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat
ቀௐଶቁ ͳ (10)
wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
70|P a g e
ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD
Chapter 3 March of the Froblins
71|P a g e
float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o
Listing 4 Vertex shader for per-instance occlusion culling
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
72|P a g e
335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands
RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )
Listing 5 Summary of our character management system
C
3 TcsftctmOdttbaipo
FccoadTmaofobt
Chapter 3 Ma
34 Cha
The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea
Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation
Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu
The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon
arch of the Fr
racterAn
nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa
canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in
xample of difgic shader
e and desired(b) User pl
we see the cushrooms (d)
of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence
oblins
nimation
h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto
ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp
ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe
mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes
ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU
redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th
ons that our determines tom the left ious poison ng away fro
eaceful restin
is illustratewherethehober isusedtouse the teeanimationweight annotableperfo
med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour
ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor
Froblins cathe current a(a) Froblin cloud in the
om the hazarng restores t
ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai
dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes
kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res
an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo
e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto
s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr
miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi
These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri
ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic
7
ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou
c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p
ns are controsed on each ed treasure tas a result bout to munts
ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo
73|P a g e
mationsandonbyvertexkthebonesfcharacters
merousdrawnd33)theursystemby
fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and
olled by the characterrsquos to the drop-they scatter
nch on some
ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
74|P a g e
Figure 8 Animation texture layout
Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss
float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering
Bone 1
Matrix Row 0
Matrix Row 1
Matrix Row 2
Bone 0
Matrix Row 0
Matrix Row 1
Matrix Row 2
Key 0 Key1 Key N
RGBA FP16
Chapter 3 March of the Froblins
75|P a g e
void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)
Listing 6 Shader code to fetch interpolate and blend bone animations
AN
7
3
F(st
Rsat(stfs
T
AdvancesinReNTatarchuk(E
76|P a g e
35 Tess
Figure 9 Ou(left) On theshaders and the tessellate
Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder
FroblLowr
Zbrushreghighr
Table 1 Com
ealͲTimeRendeEditor)
sellation
ur system alle right the textures W
ed character
erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof
lincontrolcagesolutionmod
resolutionFro
mparison of
eringin3DGra
andCrow
lows rendersame chara
While using thr on the left
GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste
edel
blinmodel
memory foo
aphicsandGam
wdRende
ing characteacter is rendhe same memwhereas the
cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo
tprint for hig
mesCoursendashS
ering
ers with extrdered withomory footprie low resolut
asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke
Polygons
5160faces
15+Mfaces
gh and low r
IGGRAPH2008
reme details ut the use oint we are ation characte
60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka
resolution m
8
in close-up of tessellatioable to add er has much
RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf
Vertexa
2Ktimes2K16b
Vert
Ind
models for ou
when using on using idehigh level ofcoarser silh
D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption
TotalMemory
ndindexbuffe
itdisplacemen
texbuffer~27
dexBuffer180
ur main char
tessellationentical pixel f details for
houettes
0and4000fied shaderdhardwareirect3Dreg11universally
ssellationasseauthoredormanceA
y
ers100KB
ntmap10MB
70MB
0MB
racter
C
Fc
Chapter 3 Ma
Figure 10 Ccharacter
arch of the Fr
Comparison
oblins
of low resolution modeel (top) and hhigh resoluttion model (
7
(bottom) for
77|P a g e
the Froblin
AN
7
Hvtcodwotddc[
3
Ihho
F(vd
Gt1g
AdvancesinReNTatarchuk(E
78|P a g e
Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08
351 GPU
n this sectiohardwareashardware fixoutlinedinF
Figure 11 A(also referrevertices thusdisplacemen
Goingthrougtakesaninp15X times fgenerationo
CoarseS
ealͲTimeRendeEditor)
essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]
UTessellati
onwewillpsusedinouxedͲfunctionigure11
An overviewed to as the s amplifyingt obtaining
ghtheproceutprimitiveor Xboxreg 3ofgraphicsc
SuperͲPrimMesh
eringin3DGra
provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional
ionPipelin
provideanorsystemWn tessellator
w of the tesseldquocontrol cagg the input mthe final tess
essforasing(whichwe60 or ATI Rcards)Aver
Tessella
aphicsandGam
veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of
ne
overviewofWedesigned
unitavailab
ellation procgerdquo or ldquothe smesh The vesellated and
gleinputpolrefertoasaRadeonreg HDrtexshader
ator
mesCoursendashS
nefits speciis effective
ologyparamowamountorenderingooverallamodisplaceme
owsustoreddvideomemhe memoryrce Table 1f the benef
GPU tesselanAPIforableonrecen
cess We stasuper-primitertex shader
d displaced h
ygonwehaasuperͲprimD 2000Ͳ4000(whichwer
Tessellated Mesh
IGGRAPH2008
fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are
1demonstrafits provided
lationpipeliaGPUtesselntconsumer
art by rendertive meshrdquo)r is used to high resolutio
avethefollomitive)anda0 GPU generefertoasa
h
Di
8
al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell
ineavailablelationpipelirGPUsThe
ring a coars The tessellaevaluate suron mesh see
wingthehaamplifiesiterations ornevaluation
Evaluatesurfacepositions
Addsplacement
active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw
ducingtheoy relevant fy savings foation can b
eoncurreninetakingade tessellation
se low resoator unit genrface position on the righ
ardwaretess(upto411t64X for Di
nshader)is
TessellatedanMes
ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor
widthThisisverallgamefor consoleorourmainbe found in
tconsumerdvantageofnprocess is
lution mesh nerates new ons and add ht
sellatorunittrianglesorrect3Dreg 11invokedfor
dDisplacedsh
d
Chapter 3 March of the Froblins
79|P a g e
eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
80|P a g e
struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS
C
L GTHsc
Fdb Hnawddeae
Chapter 3 Ma
f f f f o o o o o r
Listing 7 Ex
GiventhatwTraditionallyHoweverthspacenormachangingthe
Figure 12 Ddisplaced in be shaded us
Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese
arch of the Fr
Convert po translatin somewhat f
float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor
ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS
return o
xample simp
wearerendey animatedereexistsaalmapsInteactualdisp
Displacementhe directio
sing normal
e intend toordertocommely thatwo generatentandnormantmapmustthe tangen
MDGPUMeserequireme
oblins
osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota
S = mul( mVPS = float4( v = vNormalWS = vTangentW
S = vBinormal
le evaluation
eringourchcharactersconcernwithatcasewelacednorma
nt of the veon of geometԢ
light thedismbineTSNMwearecomthe displa
almapsalsotbe generantͲspace norhMappertontsaremet
tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir
float4( vPovPositionWS S WS lWS
n shader for
aracterwithare rendehregardstoeareessental(asshown
rtex modifietric normal
splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr
e from objectrotating abo
vPositionOS vNormalOS )vTangentOS )vBinormalOS
ositionWS 1 1 )
r rendering i
hdisplacemered and litousingdisplatiallygeneratninFigure12
es the norma displacem
faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour
t space to woout y we can
) + vInstanc ) )
) )
nstanced tes
entmapweusing tang
acementmatinganewt2)
al used for ment amount
angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor
orld space byn simplify th
cePosWS
ssellated cha
emustmakegentͲspaceappingwhentangentfram
rendering
t D The resu
cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa
8
y rotating anhe math
aracters
eanoteabonormal manlightingusimeasweare
P is the oriulting point
mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat
81|P a g e
nd
outlightingps (TSNM)ngtangentͲedisplacing
iginal point Prsquo needs to
heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan
t
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
82|P a g e
353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows
ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ
HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive
353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed
Chapter 3 March of the Froblins
83|P a g e
vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
AN
5
wrA
3SfwetdaOc
Fo3UahaTc
AdvancesinReNTatarchuk(E
58|P a g e
with2GBofregularenginA
3213Co
Solving theefunctionfor
where isequivalenttothecostreladifferentpatagentdensit
Once thescacalculateሺݔ
Figure 4 Lefof the blue m
322 Loc
Unfortunateavoideachohaveanaccuavoidancem
The basic gocollisions wi
ealͲTimeRendeEditor)
fRAM and aneandmem
onstructing
eikonalequtheenviron
the final cooslopeaswatedtodynathingdepenynearagoa
alarcost funሻThegradݔ
eft to Right Cmarker
calNavigat
lysolving totherwithacuratebehavmodelthatre
oal of a locith nearby a
eringin3DGra
anATIRademoryclocks
theCostFu
ation requirmentinthe
ሻݔሺܨ
ost functionwellasanylaamichazardsdingonthealtoprevent
nctionܨሺݔሻdientofሺݔሻ
Cost functio
tionandA
heeikonalecceptablefidiormodelfoesolvesthese
calmodel isagents and
aphicsandGam
eonTM4870HLSLsource
unction
resacost fuFroblinsdem
ሻ ൌ ሺݔሻ
n is the srgestaticobsandsituationFtovercrowd
isconstructሻiscalculate
n potential
Avoidance
equationatdelityisprohorourcharaefineͲgraine
s to providealso to nav
mesCoursendashS
graphics carecodeforo
unction thatmoiscompu
ሻݔሺܦ
staticmovebjectssuchaareweighForexampleingatgoals
ted (Figureedusingcen
gradient of
aresolutionhibitivelyexctersweauedobstacles
e each indivvigate arou
IGGRAPH2008
rdwith512uriterative
tcanbeevautedasfollow
ሻǡݔሺܣ
ement costasbuildings)htsthatcanitmaybed
4) itcanbentraldifferen
f potential T
nhighenouxpensiveforugmentour
vidual agentnd obstacle
8
2MBofGDDeikonalsolv
aluatedatews
(4)
(including tisthedebeadjusteddesiredtoin
esupplied tnces
The goals (so
ugh for largearealͲtimeglobaleikon
twith a veles and agen
DR5 videomverislistedi
achgridcel
terrainmovensityofagedspatiallytoncreasethe
to theeikon
ources) are a
enumbersoapplicationnalsolution
ocity thatwnts towards
memory andinAppendix
l Thecost
ement costntsandisoencouragecostdueto
alsolver to
at the center
ofagentstoInordertowithalocal
will preventits desired
Chapter 3 March of the Froblins
59|P a g e
destination This is typicallyhandledby a continuous cycleof examining thenearby environment andreactingbasedonthediscoveredinformation3221MethodOur local navigation and avoidance model computes agent velocities by examining the movementdirection determined by the global model and the positions and velocities of nearby agents ThisavoidancemodelisbasedontheVelocityObstacleformulation[FIORINISHILLER98vBPS08]ForourmodelwepresenteachagentasadiscEachagentthereforehasapositionavelocityaradiusamaximumspeedandaglobalgoaldirectionprovidedby theglobalsolverWe inferanorientationɅifrombyassumingthattheagentisorientedtowardsInourapplicationallagentshaveasimilarradiusandthereforeisconstantforallagentsAsinmostlocalmodelsupdatingandforrequiresknowledgeoftheandforallagents˒whereisthesetofallagentswithinacertaindistance 3222SpatialQueriesviaNovelGPUBinning Determining the positions and velocities of dynamic local obstacles requires a spatial data structurecontaining all obstacle information in the simulationWe developed a novelmultiͲpass algorithm forsortingagentsintospatialbinsdirectlyonGPUOuralgorithmusesa2Ddepthtexturearrayandasingle2DcolorbuffertoconstructadatastructureforstoringagentsinbinsThedepthtexturearrayservesasourAgentIDArrayAgiven2DtexeladdressinthisarrayservesasabinAsinglebinisa1Darraytexturearray sliceThebinarraygrowsdown through successive texturearray slicesEach sliceof the texturearraycontainsasingleagentID(binelement)TheagentIDsarestoredinbinsinascendingsortedorderbyagent IDThenumberofagentsthatfall intoagivenbinmaybe lessthanthebincapacity(which isdefinedbythenumberofdeptharrayslices)InordertoefficientlyquerytheagentIDsinagivenbinweuseaBinCounterTheBinCounterisa2DcolorbufferthatrecordstheloadoneachbinintheAgentIDArrayTofindallagentsnearaparticularworldspacepositionthepositionistranslatedintoa2DbinaddressAnytranslationfunctionmaybeusedOurworlddomainissquaresoasimpleuniformgridwasusedtomapworld spacepositions tobins Once thebinaddress isknown thebin load is read from theBinCounter Thisgivesusthenumberofagentsthatare inthebinweare interested in FinallytheeachagentrsquosIDinthebinisreadfromtheAgentIDarrayThis data structure must be updated each frame as agents move about the world Updates areperformedusingan iterativealgorithmthatbeginswithallagent IDs inabuffercalledtheworkingsetEach iterationasagentsareplaced intobins theyare removed from theworking set Thealgorithmcontinuestoiterateuntiltheworkingsetisreducedtozero
AN
6
FbWIccpbatgbsdsoFdttlstdpaAdPstt
AdvancesinReNTatarchuk(E
60|P a g e
Figure 5 Agbin These ID
WebeginbyDArrayarecurrent depcontainingapointprimitibymappingagentIDissthatarelessgivenbinonbedrawn inshadersimpldidnotrecesinglebinthonasubsequ
For the secodepthtargetthefirstpasstimetheveressthanoresettingtheirto less thandepthbufferpass Perforavoidinserti
After vertexdepthvaluePointsthatashaderissettheendoftthis iteration
ealͲTimeRendeEditor)
gent positionDs are later
yclearingthclearedto1th buffer allagent IDsiveAseachtheassociattoredasthethanthedenlytheagennto thatbinlyoutputs1iveanyageheagentwituentpass
ond iterationtandtheagessothesecrtexshaderdequaltothedepthvaluefunction s
rwhichresurmingtheldquogngclipkillin
shadingpoandonlyallaremarkedfttooutput2hispassthenalongwith
eringin3DGra
ns are rasterused to retri
eBinCount10Forthend the Bin isboundahpointpasstedagentrsquoswepointrsquosdepepthvaluestwiththelo Sinceweresultinginntswillremththelowes
nof thealgentsareproond iteratiodoessomeaeIDstoredinetosomevaomuch likultsinthepogreaterthannstructionsi
ointsarepaowsnonͲrejforrejection2sothattheeresultingshall theage
aphicsandGam
rized into a ieve agent po
ersto0toifirstiteratioCounter is
as the inputesthroughworldpositipthvalueTtoredintheowestID(cocanonlywthebincou
mainsettotstIDwillget
gorithm theocessedonceononceagaiadditionalwntheprevioualueoutsidee depthͲpeeointwiththenrdquotest inthnourpixels
assed toagectedpointsnaresimplyeBinCountestreamͲoutbents thatha
mesCoursendashS
bin counterositions and
ndicatethatonthetopͲms bound astvertexbuffthevertexsontoabinTheGPUrsquosdedepthbufforrespondingrite a singlenterbeingsheir initialcwrittenand
second sliceagainNoaintakesas iwork itrejecusAgentIDofthevalideling [EVERITelowestIDtevertexshashaderanda
geometry shstobothbediscardedn
erwillbesetbufferwillcavenotyet
IGGRAPH2008
r and an arrad velocities
tthebinsarmostsliceofthe currenferandeacshaderthepaddressasmdepthͲtestunferAsaresgtothepoineagent to aetto1atbinclearedvaluedotheragen
ceof theAgagentswerenputaworkctsthecurreArrayslicedepthrange
TT01]we arthatisgreateaderratherallowstheG
hader Theesenttothenotrasterizetto2atlocacontainallthbeenbinne
8
ay containin
reemptyandftheAgentIt color targhelement ipointrsquosscreementionedanitisconfigsultofallthntwiththelagivenbinnsthatreceeof0 Ifmuntswillbere
gent IDArraeremovedfrkingsetconentpointprPointsaremeThedeptre effectivelerthanthethanthepixPUtoperfo
geometry srasterizeraedandnotsationswhereheagentsthd The stre
ng IDs of ag
dallslicesoDArrayisboget The ws rasterizedenspacepoaboveTheuredtopassheagentsthowestdepthper iteratioivedanagenultipleagentejectedtobe
ay is setasromthewotainingallaimitive if itsmarkedasldquorhunitisstilly implemenpreviouslybxelshaderarmearlyͲzcu
hader testsandtobestrstreamedouepointsarehatwerebineamͲoutbuf
gents in that
oftheAgentoundastheworking setasa singleositionissetnormalizedsfragmentsatmaptoahvalue)willn thepixelntBinsthattsmaptoaeprocessed
the currentrkingsetongents ThissagentID isrejectedrdquobylconfigurednting a dualbinnedIDtoallowsustoulling
thepointrsquosreamedoututThepixelwrittenAtnnedduringfferwillnot
Chapter 3 March of the Froblins
61|P a g e
containagentsthatwerebinnedinthepreviousiterationsincetheywillhavebeenmarkedforrejectionduringthevertexshaderrsquosldquogreaterthanrdquotestandthuswillnothavebeenstreamedoutinthegeometryshader This streamͲout buffer becomes the new working set and is used as input for subsequentiterationsofthealgorithmSubsequentpassesfollowmuchliketheseconditerationofthealgorithmEachtimethedepthtargetissettothenextsliceoftheagentIDarraythepixelshaderissettooutputcurrentiterationnumberandanewworkingsetiscreatedforuseinthenextpassEachiterationresultsinareducedworkingsetThealgorithmcontinuesto iterateuntiltheworkingset isreducedtozero Anoverflowconditionoccurs ifthe iteration count reachesorexceeds theAgent IDArraydepthbefore theworking set is reduced tozeroThiscanoccuriftoomanyagentslandinagivenbinbutinpracticeoverflowcanbepreventedbyusinga largeenoughnumberofbinssothatagentsaresufficientlydistributedtoavoidoverflow AlsothedepthoftheAgentIDArraycanbeincreasedtoaccommodatehigherbinloadsApingͲponging technique isused tomanage theworking setbuffers The ldquopingrdquobuffer contains thecurrentworking setandactsas inputduringone iterationwhile the ldquopongrdquobufferactsas theoutputbufferTherolesofthepingandpongbuffersareswappedaftereachiterationTwo techniques are used to avoid CPUGPU synchronizations thatwould result in rendering pipelinestalls Predicated rendering featureofDirect3Dreg10 isused tocontrol theexecutionofeach iterationIdeallythealgorithmshouldonlycontinuetoiterateaslongasthepreviousiterationresultedinastreamͲoutbufferwithnonͲzero length Unfortunately ifwewere tocontrolexecutionon theCPUby issuingGPUqueriesaftereachpasstodetermine ifthealgorithmhadcompletedwewould introducestalls intherenderingpipelineduetoCPUGPUsynchronizationandthiswoulddegradeperformanceToavoidsynchronizationstallsallthedrawcallsforthemaximumnumberofiterations(correspondingtothemaximumallowablebin load)aremadeupfront WestillwantthealgorithmtoterminateonceallagentshavebeenbinnedsoweusepredicateddrawcallstoterminateuponcompletionThedrawcallsforeach iterationarepredicatedon the condition that theprevious iteration resulted inagentsbeingstreamedoutIfnoagentsarestreamedoutthenweknowtheworkingsethasbeenreducedtozeroandwecanterminateUsingcascadingpredicateddrawcallsinthiswaywillresultintheremainingdrawcallsbeingskipped Thus theGPU takes fullresponsibility for terminating thealgorithmonceall theagentshavebeenbinnedTheDirect3Dreg10DrawAutocallisusedtoissueeachpredicateddrawsincewedonotknowthesizeoftheworkingsetfromiterationtoiterationOur spatialdata structureprovides somebenefitsoverprevious techniques [HARADA07] Queryingourdatastructure isefficientbecausewestorebin loads inaBinCounterthusallowingustoonlyreadthenecessarynumberofelementsfromtheAgentIDArrayandevenearlyͲoutwhenabin isdeterminedtobeempty Additionallyour techniqueprovidesamechanism fordetectingoverflowemploys iterativestreamͲoutreductionoftheworkingsetandgivesexecutioncontroltotheGPUtoavoidpipelinestalls
AN
6
3AEssmtftcTf
wadt
FasTtd
AdvancesinReNTatarchuk(E
62|P a g e
3223Ag
AgentrsquosdirecEachagentesolutionWesuitable nummotionfideltodeterminefitnessfunctto collision icircleisequa
The updatedfunctionresu
ݐ ൌ
whereݓisaagentsݐሺሻdirectionstotimeͲdeltasi
Figure 6 Eaand velocitieshown
Twoexampltwo sets aredirectiongi
ealͲTimeRendeEditor)
gentMove
ctionsneedevaluatesaneusedfivedmber of dirityversuspeethetimetionbasedoisdeterminealtothedisc
d velocity (Eult(Equation
ሻሺݏݏ ൌ
ൌ ఢ
ൌ పෝ
aperͲagentሻreturnstheoevaluatencethelast
ach agent eves of agents
esetsofdire identicalWehaveal
eringin3DGra
mentDirec
tochangeanumberoffdirectionsinections forerformancetocollisionwntheangleedbyevalucradiusroft
Equation 7)n6)andthe
ൌݓݐሺሻ
ሺݏݏݐ ሺݏǡ పෝሺݐݏ
factoraffecteminimumtistheglobsimulationf
valuates a fin its curre
rectionstobin the locasochosent
aphicsandGam
ctionDeterm
asaresultoixeddirectioourapplicaour applicaoverheadfowithagentsrelativetotating a swetheagents
is then casmallesttim
ሺ ȉ
ሻ
పሻ Τݐ ሻ
tingthepreftimetocollisbalnavigatioframe
fixed numberent and adja
beevaluatedl coordinatetoevaluate
mesCoursendashS
mination
ofpathfindinonsrelativeationMoreation was dorcomputininthecurrethedesiredgept circleͲcir
lculated basmetocollisio
ሻǤ ͷ Ǥͷ
ferencetomsionwithallondirection
r of potentia
acent bins T
dforvelocitye frame defonlydirectio
IGGRAPH2008
ngand localtothegoalcanbeuseddeterminedngmoredireentoradjacglobaldirectrcle collision
sed on theoninthatdir
moveinthegagentsindiisthespeݏ
al movemenTwo agents a
yupdatearfined by thonsthatwo
8
lavoidancedirectiondedforincreasempiricallyectionsEachcentbinsEationandthen test inwh
direction wrection
globaldirectirectioneedofagent
t directions and their co
eshownine agent andouldcausea
modelsrsquocometerminedbysedmotionfby evaluathdirectioniachdirectioetimetocolhich the rad
with the larg
(5)
(6)
(7)
tionoravoidisthesetotandݐ
based on thorresponding
Figure6Nod its globalldquorightturn
mputationsytheglobalfidelityTheing desiredsevaluatedn isgivenallisionTimediusofeach
gest fitness
dnearbyfdiscreteistheݐ
he positions g V sets are
otethatthe navigationrdquochange in
Chapter 3 March of the Froblins
63|P a g e
orientation This eliminates the need to arbitratewhich direction an agent should turn based on thevelocitiesofotheragentsandalsoresults in fewercollision tests Inpractice this limitation isnotverynoticeableordistractingIt is importanttonotethatweemployasimplekinodynamicconstrainttorestrictanagentrsquoschange invelocityItisdesiredtothrottlethechangeinorientationinagiventimestepbecauseouragentshaveaphysical limit tohow fast they can change theirorientations Thisprevents suddenbroad changes inorientationindenseagentsituations323AgentStateManagementAsagentnavigatearound theenvironment it is important tomaintain informationabout theircurrentstateThis includesdatasuchascurrentpositionandvelocitycurrentgroupIDcurrentanimationcycle(or action) and current time within that animation cycleWe alsomaintain perͲagent data such asmaximumspeedandgoalachievementdistanceGoalachievementdistanceisarandomvalueisusedtodeterminethedistancefromthegoalatwhichanagenthasreachedthatgoalThis isratherspecifictoourspecificgoal typessuchasmushroomandgoldpatcheswhere thegoalhasaspecificareaand theboundariesarenebulous AgentanimationcycletransitionsareperformedusingdynamicflowcontrolwithintheshadercontrollingagentupdatelogicIfthecurrenttimewithinananimationcycleisgreaterthanthelengthofthecurrentanimation several conditions are checked to determinewhat animation cycle should be part of theagentrsquosstateThesearecurrentanimationcycledistancefromnearestgoalnumberofagentsnearbydistancefromfearinducingobstaclessuchastheghostfroblinandnoxiousgascloudsandcurrentgroupWhile these agent updates will not be very coherent the agent data texture is very small and theperformance impact isslightDespitethedimensionalityof inputtotheanimationtransitionfunction itmaybepossible toprecompute theanimation transition logic into textures (asdone in [MHR07])andeliminateflowcontrol
324PathfindingResultsDiscussion
Ourapproachhasbeenusedtosimulate~65000agentsatinteractiveratesonanATIRadeonTMHD4870while performing intensive rendering tasks such as multiͲmillion triangle scene rendering globalillumination approximation atmospheric scattering and highͲquality cascaded shadow mapping Allresultswere collectedon anAMDPhenomtradeX4QuadͲCoreCPU systemwith2GBofRAM and anATIRadeonTM4870graphics cardwith512MBofGDDR5 videomemoryand standardengineandmemoryclocksThemainbottleneck inourapplication is renderingamassivenumberofagents In thecaseofsimulating65Kagentsweuseasimplifiedagentmodel(acylinder)Simulation(globalandlocal)alonefor65K agents on the above system is 45 fps Simulation of crowd behavior and interaction alongwithrendering for this largenumberof agents is 31 fpsNote that evenwithusing a very simple cylindermodelduetoextremelylargenumberofagentswearerendering98MtrianglesinthelattercaseAllofourtestingresultswerecollectedrenderedatHDresolutionwith4XMSAA
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
64|P a g e
WhilewechosetousethetwocrowdbehaviorsimulationtechniquesforourglobalandlocalnavigationtheyalsocanbeusedseparatelyastheyeachhavetheirownadvantagesanddisadvantagesThecontinuumapproachusedforourglobalsolverworkswell intheFroblinsscenariowheretherearelargenumbersofagentswithonlya fewtypesofgoalsMorecomplexscenarioswithdiversetasksarelikely to be incompatible with this type of approach Another disadvantage of using a continuumapproach is that all agents aremodeled as having global knowledge An agentwillmake navigationchoicesbasedonobstacles thatarenotvisible to itwhichmayalsonotbedesiredWe feel that thecontinuum approach would be excellent for ambient crowds That is large groups of nonͲplayercharacters which are present mostly for scenery but are expected to navigate around a dynamicenvironmentormovingcharactersThelocalmodelalsohasdisadvantagesBylimitinglocalavoidancevelocitiestobeofaclockwisenaturethevariation inagent interaction issomewhat limitedUsingasmalldiscretesetof localdirections fornavigationcanalso lead tooscillationbetween twodirectionscreatingdistractingbehaviorWhile thelocal avoidancemodelwill prevent collisions in very densely packed situations scenarios arisewhereagentscandeadlockandwillbecomestuckThistypicallyhappensatsinksinagentnavigationssuchasatasmallgoalOnceagentsbecomedenselypackedaroundagoalagentsthatreachthegoalwillbeunableto navigate out of the goal area This could be solved by incorporating varying levels of aggressivebehavior into agentmovement that causes agents to push each other out of theway This type ofapproach couldbeaugmentedbya compositeagentapproach [YCP08] to ldquotrailͲblazerdquopaths throughdenselypackedagents
33CharacterLODManagementTheoverarchinggoalofoursystemissimulationandrenderingofmassivecrowdsofcharacterswithhighlevelofdetailThe latestgenerationsofcommodityGPUsdemonstrate incredible increases ingeometryperformanceespeciallywiththe inclusionofGPUtessellationpipeline(Section35)Neverthelessevenwith stateͲofͲtheͲartgraphicshardware renderingmultiple thousandsofcomplexcharacterswithhighpolygonalcountsatinteractiveratesisverytaxingRenderingthousandsofcharacterswithoveramillionofpolygonseachisneitherpracticalnorwiseasinmanycasesthesecharactersmaybeverysmallonthescreen and therefore performance is wasted on the details that go unnoticed For this reason it isessential to use culling and level of detail (LOD) techniques in order tomake this rendering problemtractableCullingandLODmanagementhavetraditionallybeenCPUͲcentrictaskstradingamodestamountofCPUoverheadforamuch largerreduction intheGPUworkload HoweveracommondifficultyariseswhenthepositionaldataisgeneratedbyaGPUͲbasedsimulationandthereforewouldrequireacostlyreadͲbackoperationforCPUͲsidescenemanagementThis isexactlythesituationwersquoveencounteredaswesimulatedandanimatedourcharactersentirelyontheGPUThealternativeistousetheavailablecomputetoperformallcullingandscenemanagementdirectlyontheGPUInourFroblinsdemowesolvethisproblembyemployingDirect3Dreg10geometryshadersina
Chapter 3 March of the Froblins
65|P a g e
novelwaytoperformcharactercullingandLODsortingentirelyontheGPUThisenablesustoperformthese tasksefficiently forGPUͲsimulatedcharactersTheunderlying ideascouldalsobeappliedwithaCPUsimulationinordertooffloadthescenemanagementfromtheCPU
331UsingStreamͲOutOperationsasFilteringOursystem takesadvantageof instancingsupportavailablewithDirect3Dreg10andDirect3Dreg101APIWerenderanarmyofcharactersasvaried instancedcharacterswith individualactionsandanimationscontrolledontheGPUThisnaturallyleadstothekeyideabehindourscenemanagementapproachtheuseofgeometryshadersthatactas filters forasetofcharacter instances A filteringshaderworksbytaking a set of point primitives as inputwhere each point contains the perͲinstance data needed torenderagivencharacter(positionorientationandanimationstate) ThefilteringshaderreͲemitsonlythosepointswhichpassaparticulartestwhilediscardingtherestTheemittedpointsarestreamedintoa bufferwhich can then be reͲbound as instance data and used to render the characters MultiplefilteringpassescanbechainedtogetherbyusingsuccessiveDrawAutocallsanddifferenttestscanbesetupsimplybyusingdifferentshadersInpracticeweuseasharedgeometryshadertoperformtheactualfilteringandperformthedifferentfilteringtestsinvertexshadersAsidefromprovidingmoremodularcodethisapproachcanalsoprovideperformancebenefitsThesourcetothisfilteringgeometryshaderisshowninListing1
struct GSInput XYZ contain the characterrsquos origin in world space W contains a group number which is used to vary character appearance float4 vPositionAndGroup PositionAndGroup XY contain the character orientation (a vector in the XZ plane) Z contains the index of the characterrsquos animation cycle W contains the time along the cycle (see section 35) float4 vDirection DirectionStateAndTime Result of predicate test 1 == emit 0 == do not emit float fResult TestResult struct GSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime [maxvertexcount(1)] void main( point GSInput vert[1] inout PointStreamltGSOutputgt outputStream ) [branch] if( vert[0]fResult == 1 ) GSOutput o ovPositionAndGroup = vert[0]vPositionAndGroup ovDirection = vert[0]vDirection outputStreamAppend( o )
Listing 1 A stream-filtering geometry shader
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
66|P a g e
332ViewͲFrustumCullingIt is straightforward to perform view frustum culling using a filtering geometry shader as describedabove ForviewͲfrustumcullingthevertexshadersimplyperformsan intersectioncheckbetweenthecharacterboundingvolumeandtheviewfrustumusingtheusualalgorithms(forexample[AHH08])Ifthe testpasses then the corresponding character is visible and its instancedata isemitted from thegeometryshaderandstreamedoutOtherwiseitisdiscardedAnexampleofacullingvertexshaderisgiveninListing2
struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirectionStateAndTime DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible TestResult Computes signed distance between a point and a plane vPlane Contains plane coefficients (abcd) where ax + by + cz = d vPoint Point to be tested against the plane float DistanceToPlane( float4 vPlane float3 vPoint ) return dot( float4( vPoint -1 ) vPlane ) Frustum cullling on a sphere Returns 1 if visible 0 otherwise float CullSphere( float4 vPlanes[6] float3 vCenter float fRadius ) for( uint i=0 ilt6 i++ ) entire sphere is outside one of the six planes cull immediately if( DistanceToPlane( vPlanes[i] vCenter ) gt fRadius ) return 0 return 1 float4 vFrustumPlanes[6] view-frustum planes in world space (normals face out) float3 vSphereCenter bounding sphere center relative to character origin float fSphereRadius bounding sphere radius VSOutput VS( VSInput i ) compute bounding sphere center in world space float3 vObjectPosWS = ivPositionAndGroupxyz float3 vSphereCenterWS = vBoundingSphereCenterxyz + vObjectPosWS perform view-frustum test float fVisible = CullSphere( vFrustumPlanes vSphereCenterWS fSphereRadius ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionStateAndTime = ivDirectionStateAndTime ofVisible = fVisible return o
Listing 2 Vertex shader for view-frustum culling
Chapter 3 March of the Froblins
67|P a g e
333OcclusionCullingWe can also perform occlusion culling in this framework to avoid rendering characters which arecompletelyoccludedbymountainsorstructuresBecauseweareperformingourcharactermanagementontheGPUweareabletoperformocclusionculling inanovelwaybytakingadvantageofthedepthinformationthatexists inthehardwareZbuffer Thisapproachrequiresfar lessCPUoverheadthananapproachbasedonpredicatedrenderingorocclusionquerieswhilestillallowingcullingagainstarbitrarydynamicoccluders Ourapproach issimilar inspirittothehierarchicalZtestingthat is implemented inmodernGPUsandwasinspiredbytheworkof[GKM93]whousedahierarchicaldepthimagecombinedwithanoctreetoculloccludedgeometryinbulkAfter rendering allof theoccluders in the scenewe construct ahierarchicaldepth image from the Zbufferwhichwewill refer toasaHiͲZmap TheHiͲZmap isamipͲmappedscreenͲresolution imagewhereeachtexelinmiplevelicontainsthemaximumdepthofallcorrespondingtexelsinmipleveliͲ1InthemostdetailedmipleveleachtexelsimplycontainsthecorrespondingdepthvaluefromtheZbufferThis depth information can be collected during themain rendering pass for the occluding objects aseparatedepthpassisnotrequiredtobuildtheHiͲZmapAfter construction of the HiͲZ map occlusion culling can be performed by examining the depthinformation forpixelswhicharecoveredbyanobjectrsquosboundingsphereandcomparingthemaximumfetcheddepthtotheprojecteddepthofapointonthespherethat isnearesttothecamera Althoughthisapproachdoesnotprovideanexactocclusiontestitgivesaconservativeestimatethatworkswellinmanycasesandwillneverresultinfalseculling3331HiͲZMapConstructionForsingleͲsamplerenderingonecanusetheHiͲZmapasthemaindepthbufferforrenderingthescene(usingaDepthStencilviewofthefirstmiplevel)InDirect3Dreg101multiͲsampleddepthbufferscanalsobesupportedwithanextrafullͲscreenquadpassbyfirstcomputingthemaximumdepthofeachpixelrsquossubͲsamplesandstoringtheresultinthelowestleveloftheHiͲZmapSubsequentlevelsaregeneratedusingasequenceofreductionpasseswhichrepeatedlyfetchtexelsandcomputetheirmaximumasshown inListing3 BecausescreenͲsized imagestypicallydonotmipwellcaremust be taken when reducing oddͲsizedmip levels In this case the pixels on the oddͲsizedboundaryedgemustfetchadditionaltexelstoensurethattheirdepthvaluesaretakenintoaccountInaddition it isnecessarytouse integercalculations forthetextureaddressarithmeticbecause floatingͲpointerrorcanresultinincorrectaddressingwhenrenderingintothelowermiplevelsEachofthereductionpassesrenders intoonemip leveloftheHiͲZmapresourcewhilesamplingfromthepreviousoneThisisvalidapproachinDirect3Dreg10aslongastheresourceviewusedfortheinputmip level does not overlap the one being used for output (different input and output viewsmust becreatedforeachpass)
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
68|P a g e
struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM
Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control
Chapter 3 March of the Froblins
69|P a g e
betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula
ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)
HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ
ௗ ୲ୟ୬ቀഇమቁ (9)
whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat
ቀௐଶቁ ͳ (10)
wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
70|P a g e
ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD
Chapter 3 March of the Froblins
71|P a g e
float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o
Listing 4 Vertex shader for per-instance occlusion culling
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
72|P a g e
335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands
RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )
Listing 5 Summary of our character management system
C
3 TcsftctmOdttbaipo
FccoadTmaofobt
Chapter 3 Ma
34 Cha
The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea
Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation
Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu
The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon
arch of the Fr
racterAn
nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa
canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in
xample of difgic shader
e and desired(b) User pl
we see the cushrooms (d)
of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence
oblins
nimation
h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto
ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp
ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe
mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes
ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU
redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th
ons that our determines tom the left ious poison ng away fro
eaceful restin
is illustratewherethehober isusedtouse the teeanimationweight annotableperfo
med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour
ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor
Froblins cathe current a(a) Froblin cloud in the
om the hazarng restores t
ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai
dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes
kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res
an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo
e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto
s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr
miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi
These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri
ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic
7
ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou
c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p
ns are controsed on each ed treasure tas a result bout to munts
ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo
73|P a g e
mationsandonbyvertexkthebonesfcharacters
merousdrawnd33)theursystemby
fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and
olled by the characterrsquos to the drop-they scatter
nch on some
ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
74|P a g e
Figure 8 Animation texture layout
Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss
float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering
Bone 1
Matrix Row 0
Matrix Row 1
Matrix Row 2
Bone 0
Matrix Row 0
Matrix Row 1
Matrix Row 2
Key 0 Key1 Key N
RGBA FP16
Chapter 3 March of the Froblins
75|P a g e
void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)
Listing 6 Shader code to fetch interpolate and blend bone animations
AN
7
3
F(st
Rsat(stfs
T
AdvancesinReNTatarchuk(E
76|P a g e
35 Tess
Figure 9 Ou(left) On theshaders and the tessellate
Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder
FroblLowr
Zbrushreghighr
Table 1 Com
ealͲTimeRendeEditor)
sellation
ur system alle right the textures W
ed character
erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof
lincontrolcagesolutionmod
resolutionFro
mparison of
eringin3DGra
andCrow
lows rendersame chara
While using thr on the left
GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste
edel
blinmodel
memory foo
aphicsandGam
wdRende
ing characteacter is rendhe same memwhereas the
cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo
tprint for hig
mesCoursendashS
ering
ers with extrdered withomory footprie low resolut
asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke
Polygons
5160faces
15+Mfaces
gh and low r
IGGRAPH2008
reme details ut the use oint we are ation characte
60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka
resolution m
8
in close-up of tessellatioable to add er has much
RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf
Vertexa
2Ktimes2K16b
Vert
Ind
models for ou
when using on using idehigh level ofcoarser silh
D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption
TotalMemory
ndindexbuffe
itdisplacemen
texbuffer~27
dexBuffer180
ur main char
tessellationentical pixel f details for
houettes
0and4000fied shaderdhardwareirect3Dreg11universally
ssellationasseauthoredormanceA
y
ers100KB
ntmap10MB
70MB
0MB
racter
C
Fc
Chapter 3 Ma
Figure 10 Ccharacter
arch of the Fr
Comparison
oblins
of low resolution modeel (top) and hhigh resoluttion model (
7
(bottom) for
77|P a g e
the Froblin
AN
7
Hvtcodwotddc[
3
Ihho
F(vd
Gt1g
AdvancesinReNTatarchuk(E
78|P a g e
Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08
351 GPU
n this sectiohardwareashardware fixoutlinedinF
Figure 11 A(also referrevertices thusdisplacemen
Goingthrougtakesaninp15X times fgenerationo
CoarseS
ealͲTimeRendeEditor)
essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]
UTessellati
onwewillpsusedinouxedͲfunctionigure11
An overviewed to as the s amplifyingt obtaining
ghtheproceutprimitiveor Xboxreg 3ofgraphicsc
SuperͲPrimMesh
eringin3DGra
provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional
ionPipelin
provideanorsystemWn tessellator
w of the tesseldquocontrol cagg the input mthe final tess
essforasing(whichwe60 or ATI Rcards)Aver
Tessella
aphicsandGam
veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of
ne
overviewofWedesigned
unitavailab
ellation procgerdquo or ldquothe smesh The vesellated and
gleinputpolrefertoasaRadeonreg HDrtexshader
ator
mesCoursendashS
nefits speciis effective
ologyparamowamountorenderingooverallamodisplaceme
owsustoreddvideomemhe memoryrce Table 1f the benef
GPU tesselanAPIforableonrecen
cess We stasuper-primitertex shader
d displaced h
ygonwehaasuperͲprimD 2000Ͳ4000(whichwer
Tessellated Mesh
IGGRAPH2008
fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are
1demonstrafits provided
lationpipeliaGPUtesselntconsumer
art by rendertive meshrdquo)r is used to high resolutio
avethefollomitive)anda0 GPU generefertoasa
h
Di
8
al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell
ineavailablelationpipelirGPUsThe
ring a coars The tessellaevaluate suron mesh see
wingthehaamplifiesiterations ornevaluation
Evaluatesurfacepositions
Addsplacement
active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw
ducingtheoy relevant fy savings foation can b
eoncurreninetakingade tessellation
se low resoator unit genrface position on the righ
ardwaretess(upto411t64X for Di
nshader)is
TessellatedanMes
ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor
widthThisisverallgamefor consoleorourmainbe found in
tconsumerdvantageofnprocess is
lution mesh nerates new ons and add ht
sellatorunittrianglesorrect3Dreg 11invokedfor
dDisplacedsh
d
Chapter 3 March of the Froblins
79|P a g e
eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
80|P a g e
struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS
C
L GTHsc
Fdb Hnawddeae
Chapter 3 Ma
f f f f o o o o o r
Listing 7 Ex
GiventhatwTraditionallyHoweverthspacenormachangingthe
Figure 12 Ddisplaced in be shaded us
Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese
arch of the Fr
Convert po translatin somewhat f
float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor
ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS
return o
xample simp
wearerendey animatedereexistsaalmapsInteactualdisp
Displacementhe directio
sing normal
e intend toordertocommely thatwo generatentandnormantmapmustthe tangen
MDGPUMeserequireme
oblins
osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota
S = mul( mVPS = float4( v = vNormalWS = vTangentW
S = vBinormal
le evaluation
eringourchcharactersconcernwithatcasewelacednorma
nt of the veon of geometԢ
light thedismbineTSNMwearecomthe displa
almapsalsotbe generantͲspace norhMappertontsaremet
tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir
float4( vPovPositionWS S WS lWS
n shader for
aracterwithare rendehregardstoeareessental(asshown
rtex modifietric normal
splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr
e from objectrotating abo
vPositionOS vNormalOS )vTangentOS )vBinormalOS
ositionWS 1 1 )
r rendering i
hdisplacemered and litousingdisplatiallygeneratninFigure12
es the norma displacem
faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour
t space to woout y we can
) + vInstanc ) )
) )
nstanced tes
entmapweusing tang
acementmatinganewt2)
al used for ment amount
angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor
orld space byn simplify th
cePosWS
ssellated cha
emustmakegentͲspaceappingwhentangentfram
rendering
t D The resu
cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa
8
y rotating anhe math
aracters
eanoteabonormal manlightingusimeasweare
P is the oriulting point
mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat
81|P a g e
nd
outlightingps (TSNM)ngtangentͲedisplacing
iginal point Prsquo needs to
heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan
t
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
82|P a g e
353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows
ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ
HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive
353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed
Chapter 3 March of the Froblins
83|P a g e
vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
59|P a g e
destination This is typicallyhandledby a continuous cycleof examining thenearby environment andreactingbasedonthediscoveredinformation3221MethodOur local navigation and avoidance model computes agent velocities by examining the movementdirection determined by the global model and the positions and velocities of nearby agents ThisavoidancemodelisbasedontheVelocityObstacleformulation[FIORINISHILLER98vBPS08]ForourmodelwepresenteachagentasadiscEachagentthereforehasapositionavelocityaradiusamaximumspeedandaglobalgoaldirectionprovidedby theglobalsolverWe inferanorientationɅifrombyassumingthattheagentisorientedtowardsInourapplicationallagentshaveasimilarradiusandthereforeisconstantforallagentsAsinmostlocalmodelsupdatingandforrequiresknowledgeoftheandforallagents˒whereisthesetofallagentswithinacertaindistance 3222SpatialQueriesviaNovelGPUBinning Determining the positions and velocities of dynamic local obstacles requires a spatial data structurecontaining all obstacle information in the simulationWe developed a novelmultiͲpass algorithm forsortingagentsintospatialbinsdirectlyonGPUOuralgorithmusesa2Ddepthtexturearrayandasingle2DcolorbuffertoconstructadatastructureforstoringagentsinbinsThedepthtexturearrayservesasourAgentIDArrayAgiven2DtexeladdressinthisarrayservesasabinAsinglebinisa1Darraytexturearray sliceThebinarraygrowsdown through successive texturearray slicesEach sliceof the texturearraycontainsasingleagentID(binelement)TheagentIDsarestoredinbinsinascendingsortedorderbyagent IDThenumberofagentsthatfall intoagivenbinmaybe lessthanthebincapacity(which isdefinedbythenumberofdeptharrayslices)InordertoefficientlyquerytheagentIDsinagivenbinweuseaBinCounterTheBinCounterisa2DcolorbufferthatrecordstheloadoneachbinintheAgentIDArrayTofindallagentsnearaparticularworldspacepositionthepositionistranslatedintoa2DbinaddressAnytranslationfunctionmaybeusedOurworlddomainissquaresoasimpleuniformgridwasusedtomapworld spacepositions tobins Once thebinaddress isknown thebin load is read from theBinCounter Thisgivesusthenumberofagentsthatare inthebinweare interested in FinallytheeachagentrsquosIDinthebinisreadfromtheAgentIDarrayThis data structure must be updated each frame as agents move about the world Updates areperformedusingan iterativealgorithmthatbeginswithallagent IDs inabuffercalledtheworkingsetEach iterationasagentsareplaced intobins theyare removed from theworking set Thealgorithmcontinuestoiterateuntiltheworkingsetisreducedtozero
AN
6
FbWIccpbatgbsdsoFdttlstdpaAdPstt
AdvancesinReNTatarchuk(E
60|P a g e
Figure 5 Agbin These ID
WebeginbyDArrayarecurrent depcontainingapointprimitibymappingagentIDissthatarelessgivenbinonbedrawn inshadersimpldidnotrecesinglebinthonasubsequ
For the secodepthtargetthefirstpasstimetheveressthanoresettingtheirto less thandepthbufferpass Perforavoidinserti
After vertexdepthvaluePointsthatashaderissettheendoftthis iteration
ealͲTimeRendeEditor)
gent positionDs are later
yclearingthclearedto1th buffer allagent IDsiveAseachtheassociattoredasthethanthedenlytheagennto thatbinlyoutputs1iveanyageheagentwituentpass
ond iterationtandtheagessothesecrtexshaderdequaltothedepthvaluefunction s
rwhichresurmingtheldquogngclipkillin
shadingpoandonlyallaremarkedfttooutput2hispassthenalongwith
eringin3DGra
ns are rasterused to retri
eBinCount10Forthend the Bin isboundahpointpasstedagentrsquoswepointrsquosdepepthvaluestwiththelo Sinceweresultinginntswillremththelowes
nof thealgentsareproond iteratiodoessomeaeIDstoredinetosomevaomuch likultsinthepogreaterthannstructionsi
ointsarepaowsnonͲrejforrejection2sothattheeresultingshall theage
aphicsandGam
rized into a ieve agent po
ersto0toifirstiteratioCounter is
as the inputesthroughworldpositipthvalueTtoredintheowestID(cocanonlywthebincou
mainsettotstIDwillget
gorithm theocessedonceononceagaiadditionalwntheprevioualueoutsidee depthͲpeeointwiththenrdquotest inthnourpixels
assed toagectedpointsnaresimplyeBinCountestreamͲoutbents thatha
mesCoursendashS
bin counterositions and
ndicatethatonthetopͲms bound astvertexbuffthevertexsontoabinTheGPUrsquosdedepthbufforrespondingrite a singlenterbeingsheir initialcwrittenand
second sliceagainNoaintakesas iwork itrejecusAgentIDofthevalideling [EVERITelowestIDtevertexshashaderanda
geometry shstobothbediscardedn
erwillbesetbufferwillcavenotyet
IGGRAPH2008
r and an arrad velocities
tthebinsarmostsliceofthe currenferandeacshaderthepaddressasmdepthͲtestunferAsaresgtothepoineagent to aetto1atbinclearedvaluedotheragen
ceof theAgagentswerenputaworkctsthecurreArrayslicedepthrange
TT01]we arthatisgreateaderratherallowstheG
hader Theesenttothenotrasterizetto2atlocacontainallthbeenbinne
8
ay containin
reemptyandftheAgentIt color targhelement ipointrsquosscreementionedanitisconfigsultofallthntwiththelagivenbinnsthatreceeof0 Ifmuntswillbere
gent IDArraeremovedfrkingsetconentpointprPointsaremeThedeptre effectivelerthanthethanthepixPUtoperfo
geometry srasterizeraedandnotsationswhereheagentsthd The stre
ng IDs of ag
dallslicesoDArrayisboget The ws rasterizedenspacepoaboveTheuredtopassheagentsthowestdepthper iteratioivedanagenultipleagentejectedtobe
ay is setasromthewotainingallaimitive if itsmarkedasldquorhunitisstilly implemenpreviouslybxelshaderarmearlyͲzcu
hader testsandtobestrstreamedouepointsarehatwerebineamͲoutbuf
gents in that
oftheAgentoundastheworking setasa singleositionissetnormalizedsfragmentsatmaptoahvalue)willn thepixelntBinsthattsmaptoaeprocessed
the currentrkingsetongents ThissagentID isrejectedrdquobylconfigurednting a dualbinnedIDtoallowsustoulling
thepointrsquosreamedoututThepixelwrittenAtnnedduringfferwillnot
Chapter 3 March of the Froblins
61|P a g e
containagentsthatwerebinnedinthepreviousiterationsincetheywillhavebeenmarkedforrejectionduringthevertexshaderrsquosldquogreaterthanrdquotestandthuswillnothavebeenstreamedoutinthegeometryshader This streamͲout buffer becomes the new working set and is used as input for subsequentiterationsofthealgorithmSubsequentpassesfollowmuchliketheseconditerationofthealgorithmEachtimethedepthtargetissettothenextsliceoftheagentIDarraythepixelshaderissettooutputcurrentiterationnumberandanewworkingsetiscreatedforuseinthenextpassEachiterationresultsinareducedworkingsetThealgorithmcontinuesto iterateuntiltheworkingset isreducedtozero Anoverflowconditionoccurs ifthe iteration count reachesorexceeds theAgent IDArraydepthbefore theworking set is reduced tozeroThiscanoccuriftoomanyagentslandinagivenbinbutinpracticeoverflowcanbepreventedbyusinga largeenoughnumberofbinssothatagentsaresufficientlydistributedtoavoidoverflow AlsothedepthoftheAgentIDArraycanbeincreasedtoaccommodatehigherbinloadsApingͲponging technique isused tomanage theworking setbuffers The ldquopingrdquobuffer contains thecurrentworking setandactsas inputduringone iterationwhile the ldquopongrdquobufferactsas theoutputbufferTherolesofthepingandpongbuffersareswappedaftereachiterationTwo techniques are used to avoid CPUGPU synchronizations thatwould result in rendering pipelinestalls Predicated rendering featureofDirect3Dreg10 isused tocontrol theexecutionofeach iterationIdeallythealgorithmshouldonlycontinuetoiterateaslongasthepreviousiterationresultedinastreamͲoutbufferwithnonͲzero length Unfortunately ifwewere tocontrolexecutionon theCPUby issuingGPUqueriesaftereachpasstodetermine ifthealgorithmhadcompletedwewould introducestalls intherenderingpipelineduetoCPUGPUsynchronizationandthiswoulddegradeperformanceToavoidsynchronizationstallsallthedrawcallsforthemaximumnumberofiterations(correspondingtothemaximumallowablebin load)aremadeupfront WestillwantthealgorithmtoterminateonceallagentshavebeenbinnedsoweusepredicateddrawcallstoterminateuponcompletionThedrawcallsforeach iterationarepredicatedon the condition that theprevious iteration resulted inagentsbeingstreamedoutIfnoagentsarestreamedoutthenweknowtheworkingsethasbeenreducedtozeroandwecanterminateUsingcascadingpredicateddrawcallsinthiswaywillresultintheremainingdrawcallsbeingskipped Thus theGPU takes fullresponsibility for terminating thealgorithmonceall theagentshavebeenbinnedTheDirect3Dreg10DrawAutocallisusedtoissueeachpredicateddrawsincewedonotknowthesizeoftheworkingsetfromiterationtoiterationOur spatialdata structureprovides somebenefitsoverprevious techniques [HARADA07] Queryingourdatastructure isefficientbecausewestorebin loads inaBinCounterthusallowingustoonlyreadthenecessarynumberofelementsfromtheAgentIDArrayandevenearlyͲoutwhenabin isdeterminedtobeempty Additionallyour techniqueprovidesamechanism fordetectingoverflowemploys iterativestreamͲoutreductionoftheworkingsetandgivesexecutioncontroltotheGPUtoavoidpipelinestalls
AN
6
3AEssmtftcTf
wadt
FasTtd
AdvancesinReNTatarchuk(E
62|P a g e
3223Ag
AgentrsquosdirecEachagentesolutionWesuitable nummotionfideltodeterminefitnessfunctto collision icircleisequa
The updatedfunctionresu
ݐ ൌ
whereݓisaagentsݐሺሻdirectionstotimeͲdeltasi
Figure 6 Eaand velocitieshown
Twoexampltwo sets aredirectiongi
ealͲTimeRendeEditor)
gentMove
ctionsneedevaluatesaneusedfivedmber of dirityversuspeethetimetionbasedoisdeterminealtothedisc
d velocity (Eult(Equation
ሻሺݏݏ ൌ
ൌ ఢ
ൌ పෝ
aperͲagentሻreturnstheoevaluatencethelast
ach agent eves of agents
esetsofdire identicalWehaveal
eringin3DGra
mentDirec
tochangeanumberoffdirectionsinections forerformancetocollisionwntheangleedbyevalucradiusroft
Equation 7)n6)andthe
ൌݓݐሺሻ
ሺݏݏݐ ሺݏǡ పෝሺݐݏ
factoraffecteminimumtistheglobsimulationf
valuates a fin its curre
rectionstobin the locasochosent
aphicsandGam
ctionDeterm
asaresultoixeddirectioourapplicaour applicaoverheadfowithagentsrelativetotating a swetheagents
is then casmallesttim
ሺ ȉ
ሻ
పሻ Τݐ ሻ
tingthepreftimetocollisbalnavigatioframe
fixed numberent and adja
beevaluatedl coordinatetoevaluate
mesCoursendashS
mination
ofpathfindinonsrelativeationMoreation was dorcomputininthecurrethedesiredgept circleͲcir
lculated basmetocollisio
ሻǤ ͷ Ǥͷ
ferencetomsionwithallondirection
r of potentia
acent bins T
dforvelocitye frame defonlydirectio
IGGRAPH2008
ngand localtothegoalcanbeuseddeterminedngmoredireentoradjacglobaldirectrcle collision
sed on theoninthatdir
moveinthegagentsindiisthespeݏ
al movemenTwo agents a
yupdatearfined by thonsthatwo
8
lavoidancedirectiondedforincreasempiricallyectionsEachcentbinsEationandthen test inwh
direction wrection
globaldirectirectioneedofagent
t directions and their co
eshownine agent andouldcausea
modelsrsquocometerminedbysedmotionfby evaluathdirectioniachdirectioetimetocolhich the rad
with the larg
(5)
(6)
(7)
tionoravoidisthesetotandݐ
based on thorresponding
Figure6Nod its globalldquorightturn
mputationsytheglobalfidelityTheing desiredsevaluatedn isgivenallisionTimediusofeach
gest fitness
dnearbyfdiscreteistheݐ
he positions g V sets are
otethatthe navigationrdquochange in
Chapter 3 March of the Froblins
63|P a g e
orientation This eliminates the need to arbitratewhich direction an agent should turn based on thevelocitiesofotheragentsandalsoresults in fewercollision tests Inpractice this limitation isnotverynoticeableordistractingIt is importanttonotethatweemployasimplekinodynamicconstrainttorestrictanagentrsquoschange invelocityItisdesiredtothrottlethechangeinorientationinagiventimestepbecauseouragentshaveaphysical limit tohow fast they can change theirorientations Thisprevents suddenbroad changes inorientationindenseagentsituations323AgentStateManagementAsagentnavigatearound theenvironment it is important tomaintain informationabout theircurrentstateThis includesdatasuchascurrentpositionandvelocitycurrentgroupIDcurrentanimationcycle(or action) and current time within that animation cycleWe alsomaintain perͲagent data such asmaximumspeedandgoalachievementdistanceGoalachievementdistanceisarandomvalueisusedtodeterminethedistancefromthegoalatwhichanagenthasreachedthatgoalThis isratherspecifictoourspecificgoal typessuchasmushroomandgoldpatcheswhere thegoalhasaspecificareaand theboundariesarenebulous AgentanimationcycletransitionsareperformedusingdynamicflowcontrolwithintheshadercontrollingagentupdatelogicIfthecurrenttimewithinananimationcycleisgreaterthanthelengthofthecurrentanimation several conditions are checked to determinewhat animation cycle should be part of theagentrsquosstateThesearecurrentanimationcycledistancefromnearestgoalnumberofagentsnearbydistancefromfearinducingobstaclessuchastheghostfroblinandnoxiousgascloudsandcurrentgroupWhile these agent updates will not be very coherent the agent data texture is very small and theperformance impact isslightDespitethedimensionalityof inputtotheanimationtransitionfunction itmaybepossible toprecompute theanimation transition logic into textures (asdone in [MHR07])andeliminateflowcontrol
324PathfindingResultsDiscussion
Ourapproachhasbeenusedtosimulate~65000agentsatinteractiveratesonanATIRadeonTMHD4870while performing intensive rendering tasks such as multiͲmillion triangle scene rendering globalillumination approximation atmospheric scattering and highͲquality cascaded shadow mapping Allresultswere collectedon anAMDPhenomtradeX4QuadͲCoreCPU systemwith2GBofRAM and anATIRadeonTM4870graphics cardwith512MBofGDDR5 videomemoryand standardengineandmemoryclocksThemainbottleneck inourapplication is renderingamassivenumberofagents In thecaseofsimulating65Kagentsweuseasimplifiedagentmodel(acylinder)Simulation(globalandlocal)alonefor65K agents on the above system is 45 fps Simulation of crowd behavior and interaction alongwithrendering for this largenumberof agents is 31 fpsNote that evenwithusing a very simple cylindermodelduetoextremelylargenumberofagentswearerendering98MtrianglesinthelattercaseAllofourtestingresultswerecollectedrenderedatHDresolutionwith4XMSAA
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
64|P a g e
WhilewechosetousethetwocrowdbehaviorsimulationtechniquesforourglobalandlocalnavigationtheyalsocanbeusedseparatelyastheyeachhavetheirownadvantagesanddisadvantagesThecontinuumapproachusedforourglobalsolverworkswell intheFroblinsscenariowheretherearelargenumbersofagentswithonlya fewtypesofgoalsMorecomplexscenarioswithdiversetasksarelikely to be incompatible with this type of approach Another disadvantage of using a continuumapproach is that all agents aremodeled as having global knowledge An agentwillmake navigationchoicesbasedonobstacles thatarenotvisible to itwhichmayalsonotbedesiredWe feel that thecontinuum approach would be excellent for ambient crowds That is large groups of nonͲplayercharacters which are present mostly for scenery but are expected to navigate around a dynamicenvironmentormovingcharactersThelocalmodelalsohasdisadvantagesBylimitinglocalavoidancevelocitiestobeofaclockwisenaturethevariation inagent interaction issomewhat limitedUsingasmalldiscretesetof localdirections fornavigationcanalso lead tooscillationbetween twodirectionscreatingdistractingbehaviorWhile thelocal avoidancemodelwill prevent collisions in very densely packed situations scenarios arisewhereagentscandeadlockandwillbecomestuckThistypicallyhappensatsinksinagentnavigationssuchasatasmallgoalOnceagentsbecomedenselypackedaroundagoalagentsthatreachthegoalwillbeunableto navigate out of the goal area This could be solved by incorporating varying levels of aggressivebehavior into agentmovement that causes agents to push each other out of theway This type ofapproach couldbeaugmentedbya compositeagentapproach [YCP08] to ldquotrailͲblazerdquopaths throughdenselypackedagents
33CharacterLODManagementTheoverarchinggoalofoursystemissimulationandrenderingofmassivecrowdsofcharacterswithhighlevelofdetailThe latestgenerationsofcommodityGPUsdemonstrate incredible increases ingeometryperformanceespeciallywiththe inclusionofGPUtessellationpipeline(Section35)Neverthelessevenwith stateͲofͲtheͲartgraphicshardware renderingmultiple thousandsofcomplexcharacterswithhighpolygonalcountsatinteractiveratesisverytaxingRenderingthousandsofcharacterswithoveramillionofpolygonseachisneitherpracticalnorwiseasinmanycasesthesecharactersmaybeverysmallonthescreen and therefore performance is wasted on the details that go unnoticed For this reason it isessential to use culling and level of detail (LOD) techniques in order tomake this rendering problemtractableCullingandLODmanagementhavetraditionallybeenCPUͲcentrictaskstradingamodestamountofCPUoverheadforamuch largerreduction intheGPUworkload HoweveracommondifficultyariseswhenthepositionaldataisgeneratedbyaGPUͲbasedsimulationandthereforewouldrequireacostlyreadͲbackoperationforCPUͲsidescenemanagementThis isexactlythesituationwersquoveencounteredaswesimulatedandanimatedourcharactersentirelyontheGPUThealternativeistousetheavailablecomputetoperformallcullingandscenemanagementdirectlyontheGPUInourFroblinsdemowesolvethisproblembyemployingDirect3Dreg10geometryshadersina
Chapter 3 March of the Froblins
65|P a g e
novelwaytoperformcharactercullingandLODsortingentirelyontheGPUThisenablesustoperformthese tasksefficiently forGPUͲsimulatedcharactersTheunderlying ideascouldalsobeappliedwithaCPUsimulationinordertooffloadthescenemanagementfromtheCPU
331UsingStreamͲOutOperationsasFilteringOursystem takesadvantageof instancingsupportavailablewithDirect3Dreg10andDirect3Dreg101APIWerenderanarmyofcharactersasvaried instancedcharacterswith individualactionsandanimationscontrolledontheGPUThisnaturallyleadstothekeyideabehindourscenemanagementapproachtheuseofgeometryshadersthatactas filters forasetofcharacter instances A filteringshaderworksbytaking a set of point primitives as inputwhere each point contains the perͲinstance data needed torenderagivencharacter(positionorientationandanimationstate) ThefilteringshaderreͲemitsonlythosepointswhichpassaparticulartestwhilediscardingtherestTheemittedpointsarestreamedintoa bufferwhich can then be reͲbound as instance data and used to render the characters MultiplefilteringpassescanbechainedtogetherbyusingsuccessiveDrawAutocallsanddifferenttestscanbesetupsimplybyusingdifferentshadersInpracticeweuseasharedgeometryshadertoperformtheactualfilteringandperformthedifferentfilteringtestsinvertexshadersAsidefromprovidingmoremodularcodethisapproachcanalsoprovideperformancebenefitsThesourcetothisfilteringgeometryshaderisshowninListing1
struct GSInput XYZ contain the characterrsquos origin in world space W contains a group number which is used to vary character appearance float4 vPositionAndGroup PositionAndGroup XY contain the character orientation (a vector in the XZ plane) Z contains the index of the characterrsquos animation cycle W contains the time along the cycle (see section 35) float4 vDirection DirectionStateAndTime Result of predicate test 1 == emit 0 == do not emit float fResult TestResult struct GSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime [maxvertexcount(1)] void main( point GSInput vert[1] inout PointStreamltGSOutputgt outputStream ) [branch] if( vert[0]fResult == 1 ) GSOutput o ovPositionAndGroup = vert[0]vPositionAndGroup ovDirection = vert[0]vDirection outputStreamAppend( o )
Listing 1 A stream-filtering geometry shader
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
66|P a g e
332ViewͲFrustumCullingIt is straightforward to perform view frustum culling using a filtering geometry shader as describedabove ForviewͲfrustumcullingthevertexshadersimplyperformsan intersectioncheckbetweenthecharacterboundingvolumeandtheviewfrustumusingtheusualalgorithms(forexample[AHH08])Ifthe testpasses then the corresponding character is visible and its instancedata isemitted from thegeometryshaderandstreamedoutOtherwiseitisdiscardedAnexampleofacullingvertexshaderisgiveninListing2
struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirectionStateAndTime DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible TestResult Computes signed distance between a point and a plane vPlane Contains plane coefficients (abcd) where ax + by + cz = d vPoint Point to be tested against the plane float DistanceToPlane( float4 vPlane float3 vPoint ) return dot( float4( vPoint -1 ) vPlane ) Frustum cullling on a sphere Returns 1 if visible 0 otherwise float CullSphere( float4 vPlanes[6] float3 vCenter float fRadius ) for( uint i=0 ilt6 i++ ) entire sphere is outside one of the six planes cull immediately if( DistanceToPlane( vPlanes[i] vCenter ) gt fRadius ) return 0 return 1 float4 vFrustumPlanes[6] view-frustum planes in world space (normals face out) float3 vSphereCenter bounding sphere center relative to character origin float fSphereRadius bounding sphere radius VSOutput VS( VSInput i ) compute bounding sphere center in world space float3 vObjectPosWS = ivPositionAndGroupxyz float3 vSphereCenterWS = vBoundingSphereCenterxyz + vObjectPosWS perform view-frustum test float fVisible = CullSphere( vFrustumPlanes vSphereCenterWS fSphereRadius ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionStateAndTime = ivDirectionStateAndTime ofVisible = fVisible return o
Listing 2 Vertex shader for view-frustum culling
Chapter 3 March of the Froblins
67|P a g e
333OcclusionCullingWe can also perform occlusion culling in this framework to avoid rendering characters which arecompletelyoccludedbymountainsorstructuresBecauseweareperformingourcharactermanagementontheGPUweareabletoperformocclusionculling inanovelwaybytakingadvantageofthedepthinformationthatexists inthehardwareZbuffer Thisapproachrequiresfar lessCPUoverheadthananapproachbasedonpredicatedrenderingorocclusionquerieswhilestillallowingcullingagainstarbitrarydynamicoccluders Ourapproach issimilar inspirittothehierarchicalZtestingthat is implemented inmodernGPUsandwasinspiredbytheworkof[GKM93]whousedahierarchicaldepthimagecombinedwithanoctreetoculloccludedgeometryinbulkAfter rendering allof theoccluders in the scenewe construct ahierarchicaldepth image from the Zbufferwhichwewill refer toasaHiͲZmap TheHiͲZmap isamipͲmappedscreenͲresolution imagewhereeachtexelinmiplevelicontainsthemaximumdepthofallcorrespondingtexelsinmipleveliͲ1InthemostdetailedmipleveleachtexelsimplycontainsthecorrespondingdepthvaluefromtheZbufferThis depth information can be collected during themain rendering pass for the occluding objects aseparatedepthpassisnotrequiredtobuildtheHiͲZmapAfter construction of the HiͲZ map occlusion culling can be performed by examining the depthinformation forpixelswhicharecoveredbyanobjectrsquosboundingsphereandcomparingthemaximumfetcheddepthtotheprojecteddepthofapointonthespherethat isnearesttothecamera Althoughthisapproachdoesnotprovideanexactocclusiontestitgivesaconservativeestimatethatworkswellinmanycasesandwillneverresultinfalseculling3331HiͲZMapConstructionForsingleͲsamplerenderingonecanusetheHiͲZmapasthemaindepthbufferforrenderingthescene(usingaDepthStencilviewofthefirstmiplevel)InDirect3Dreg101multiͲsampleddepthbufferscanalsobesupportedwithanextrafullͲscreenquadpassbyfirstcomputingthemaximumdepthofeachpixelrsquossubͲsamplesandstoringtheresultinthelowestleveloftheHiͲZmapSubsequentlevelsaregeneratedusingasequenceofreductionpasseswhichrepeatedlyfetchtexelsandcomputetheirmaximumasshown inListing3 BecausescreenͲsized imagestypicallydonotmipwellcaremust be taken when reducing oddͲsizedmip levels In this case the pixels on the oddͲsizedboundaryedgemustfetchadditionaltexelstoensurethattheirdepthvaluesaretakenintoaccountInaddition it isnecessarytouse integercalculations forthetextureaddressarithmeticbecause floatingͲpointerrorcanresultinincorrectaddressingwhenrenderingintothelowermiplevelsEachofthereductionpassesrenders intoonemip leveloftheHiͲZmapresourcewhilesamplingfromthepreviousoneThisisvalidapproachinDirect3Dreg10aslongastheresourceviewusedfortheinputmip level does not overlap the one being used for output (different input and output viewsmust becreatedforeachpass)
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
68|P a g e
struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM
Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control
Chapter 3 March of the Froblins
69|P a g e
betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula
ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)
HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ
ௗ ୲ୟ୬ቀഇమቁ (9)
whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat
ቀௐଶቁ ͳ (10)
wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
70|P a g e
ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD
Chapter 3 March of the Froblins
71|P a g e
float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o
Listing 4 Vertex shader for per-instance occlusion culling
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
72|P a g e
335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands
RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )
Listing 5 Summary of our character management system
C
3 TcsftctmOdttbaipo
FccoadTmaofobt
Chapter 3 Ma
34 Cha
The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea
Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation
Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu
The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon
arch of the Fr
racterAn
nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa
canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in
xample of difgic shader
e and desired(b) User pl
we see the cushrooms (d)
of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence
oblins
nimation
h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto
ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp
ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe
mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes
ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU
redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th
ons that our determines tom the left ious poison ng away fro
eaceful restin
is illustratewherethehober isusedtouse the teeanimationweight annotableperfo
med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour
ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor
Froblins cathe current a(a) Froblin cloud in the
om the hazarng restores t
ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai
dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes
kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res
an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo
e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto
s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr
miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi
These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri
ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic
7
ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou
c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p
ns are controsed on each ed treasure tas a result bout to munts
ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo
73|P a g e
mationsandonbyvertexkthebonesfcharacters
merousdrawnd33)theursystemby
fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and
olled by the characterrsquos to the drop-they scatter
nch on some
ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
74|P a g e
Figure 8 Animation texture layout
Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss
float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering
Bone 1
Matrix Row 0
Matrix Row 1
Matrix Row 2
Bone 0
Matrix Row 0
Matrix Row 1
Matrix Row 2
Key 0 Key1 Key N
RGBA FP16
Chapter 3 March of the Froblins
75|P a g e
void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)
Listing 6 Shader code to fetch interpolate and blend bone animations
AN
7
3
F(st
Rsat(stfs
T
AdvancesinReNTatarchuk(E
76|P a g e
35 Tess
Figure 9 Ou(left) On theshaders and the tessellate
Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder
FroblLowr
Zbrushreghighr
Table 1 Com
ealͲTimeRendeEditor)
sellation
ur system alle right the textures W
ed character
erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof
lincontrolcagesolutionmod
resolutionFro
mparison of
eringin3DGra
andCrow
lows rendersame chara
While using thr on the left
GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste
edel
blinmodel
memory foo
aphicsandGam
wdRende
ing characteacter is rendhe same memwhereas the
cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo
tprint for hig
mesCoursendashS
ering
ers with extrdered withomory footprie low resolut
asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke
Polygons
5160faces
15+Mfaces
gh and low r
IGGRAPH2008
reme details ut the use oint we are ation characte
60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka
resolution m
8
in close-up of tessellatioable to add er has much
RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf
Vertexa
2Ktimes2K16b
Vert
Ind
models for ou
when using on using idehigh level ofcoarser silh
D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption
TotalMemory
ndindexbuffe
itdisplacemen
texbuffer~27
dexBuffer180
ur main char
tessellationentical pixel f details for
houettes
0and4000fied shaderdhardwareirect3Dreg11universally
ssellationasseauthoredormanceA
y
ers100KB
ntmap10MB
70MB
0MB
racter
C
Fc
Chapter 3 Ma
Figure 10 Ccharacter
arch of the Fr
Comparison
oblins
of low resolution modeel (top) and hhigh resoluttion model (
7
(bottom) for
77|P a g e
the Froblin
AN
7
Hvtcodwotddc[
3
Ihho
F(vd
Gt1g
AdvancesinReNTatarchuk(E
78|P a g e
Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08
351 GPU
n this sectiohardwareashardware fixoutlinedinF
Figure 11 A(also referrevertices thusdisplacemen
Goingthrougtakesaninp15X times fgenerationo
CoarseS
ealͲTimeRendeEditor)
essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]
UTessellati
onwewillpsusedinouxedͲfunctionigure11
An overviewed to as the s amplifyingt obtaining
ghtheproceutprimitiveor Xboxreg 3ofgraphicsc
SuperͲPrimMesh
eringin3DGra
provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional
ionPipelin
provideanorsystemWn tessellator
w of the tesseldquocontrol cagg the input mthe final tess
essforasing(whichwe60 or ATI Rcards)Aver
Tessella
aphicsandGam
veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of
ne
overviewofWedesigned
unitavailab
ellation procgerdquo or ldquothe smesh The vesellated and
gleinputpolrefertoasaRadeonreg HDrtexshader
ator
mesCoursendashS
nefits speciis effective
ologyparamowamountorenderingooverallamodisplaceme
owsustoreddvideomemhe memoryrce Table 1f the benef
GPU tesselanAPIforableonrecen
cess We stasuper-primitertex shader
d displaced h
ygonwehaasuperͲprimD 2000Ͳ4000(whichwer
Tessellated Mesh
IGGRAPH2008
fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are
1demonstrafits provided
lationpipeliaGPUtesselntconsumer
art by rendertive meshrdquo)r is used to high resolutio
avethefollomitive)anda0 GPU generefertoasa
h
Di
8
al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell
ineavailablelationpipelirGPUsThe
ring a coars The tessellaevaluate suron mesh see
wingthehaamplifiesiterations ornevaluation
Evaluatesurfacepositions
Addsplacement
active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw
ducingtheoy relevant fy savings foation can b
eoncurreninetakingade tessellation
se low resoator unit genrface position on the righ
ardwaretess(upto411t64X for Di
nshader)is
TessellatedanMes
ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor
widthThisisverallgamefor consoleorourmainbe found in
tconsumerdvantageofnprocess is
lution mesh nerates new ons and add ht
sellatorunittrianglesorrect3Dreg 11invokedfor
dDisplacedsh
d
Chapter 3 March of the Froblins
79|P a g e
eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
80|P a g e
struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS
C
L GTHsc
Fdb Hnawddeae
Chapter 3 Ma
f f f f o o o o o r
Listing 7 Ex
GiventhatwTraditionallyHoweverthspacenormachangingthe
Figure 12 Ddisplaced in be shaded us
Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese
arch of the Fr
Convert po translatin somewhat f
float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor
ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS
return o
xample simp
wearerendey animatedereexistsaalmapsInteactualdisp
Displacementhe directio
sing normal
e intend toordertocommely thatwo generatentandnormantmapmustthe tangen
MDGPUMeserequireme
oblins
osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota
S = mul( mVPS = float4( v = vNormalWS = vTangentW
S = vBinormal
le evaluation
eringourchcharactersconcernwithatcasewelacednorma
nt of the veon of geometԢ
light thedismbineTSNMwearecomthe displa
almapsalsotbe generantͲspace norhMappertontsaremet
tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir
float4( vPovPositionWS S WS lWS
n shader for
aracterwithare rendehregardstoeareessental(asshown
rtex modifietric normal
splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr
e from objectrotating abo
vPositionOS vNormalOS )vTangentOS )vBinormalOS
ositionWS 1 1 )
r rendering i
hdisplacemered and litousingdisplatiallygeneratninFigure12
es the norma displacem
faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour
t space to woout y we can
) + vInstanc ) )
) )
nstanced tes
entmapweusing tang
acementmatinganewt2)
al used for ment amount
angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor
orld space byn simplify th
cePosWS
ssellated cha
emustmakegentͲspaceappingwhentangentfram
rendering
t D The resu
cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa
8
y rotating anhe math
aracters
eanoteabonormal manlightingusimeasweare
P is the oriulting point
mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat
81|P a g e
nd
outlightingps (TSNM)ngtangentͲedisplacing
iginal point Prsquo needs to
heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan
t
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
82|P a g e
353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows
ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ
HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive
353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed
Chapter 3 March of the Froblins
83|P a g e
vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
AN
6
FbWIccpbatgbsdsoFdttlstdpaAdPstt
AdvancesinReNTatarchuk(E
60|P a g e
Figure 5 Agbin These ID
WebeginbyDArrayarecurrent depcontainingapointprimitibymappingagentIDissthatarelessgivenbinonbedrawn inshadersimpldidnotrecesinglebinthonasubsequ
For the secodepthtargetthefirstpasstimetheveressthanoresettingtheirto less thandepthbufferpass Perforavoidinserti
After vertexdepthvaluePointsthatashaderissettheendoftthis iteration
ealͲTimeRendeEditor)
gent positionDs are later
yclearingthclearedto1th buffer allagent IDsiveAseachtheassociattoredasthethanthedenlytheagennto thatbinlyoutputs1iveanyageheagentwituentpass
ond iterationtandtheagessothesecrtexshaderdequaltothedepthvaluefunction s
rwhichresurmingtheldquogngclipkillin
shadingpoandonlyallaremarkedfttooutput2hispassthenalongwith
eringin3DGra
ns are rasterused to retri
eBinCount10Forthend the Bin isboundahpointpasstedagentrsquoswepointrsquosdepepthvaluestwiththelo Sinceweresultinginntswillremththelowes
nof thealgentsareproond iteratiodoessomeaeIDstoredinetosomevaomuch likultsinthepogreaterthannstructionsi
ointsarepaowsnonͲrejforrejection2sothattheeresultingshall theage
aphicsandGam
rized into a ieve agent po
ersto0toifirstiteratioCounter is
as the inputesthroughworldpositipthvalueTtoredintheowestID(cocanonlywthebincou
mainsettotstIDwillget
gorithm theocessedonceononceagaiadditionalwntheprevioualueoutsidee depthͲpeeointwiththenrdquotest inthnourpixels
assed toagectedpointsnaresimplyeBinCountestreamͲoutbents thatha
mesCoursendashS
bin counterositions and
ndicatethatonthetopͲms bound astvertexbuffthevertexsontoabinTheGPUrsquosdedepthbufforrespondingrite a singlenterbeingsheir initialcwrittenand
second sliceagainNoaintakesas iwork itrejecusAgentIDofthevalideling [EVERITelowestIDtevertexshashaderanda
geometry shstobothbediscardedn
erwillbesetbufferwillcavenotyet
IGGRAPH2008
r and an arrad velocities
tthebinsarmostsliceofthe currenferandeacshaderthepaddressasmdepthͲtestunferAsaresgtothepoineagent to aetto1atbinclearedvaluedotheragen
ceof theAgagentswerenputaworkctsthecurreArrayslicedepthrange
TT01]we arthatisgreateaderratherallowstheG
hader Theesenttothenotrasterizetto2atlocacontainallthbeenbinne
8
ay containin
reemptyandftheAgentIt color targhelement ipointrsquosscreementionedanitisconfigsultofallthntwiththelagivenbinnsthatreceeof0 Ifmuntswillbere
gent IDArraeremovedfrkingsetconentpointprPointsaremeThedeptre effectivelerthanthethanthepixPUtoperfo
geometry srasterizeraedandnotsationswhereheagentsthd The stre
ng IDs of ag
dallslicesoDArrayisboget The ws rasterizedenspacepoaboveTheuredtopassheagentsthowestdepthper iteratioivedanagenultipleagentejectedtobe
ay is setasromthewotainingallaimitive if itsmarkedasldquorhunitisstilly implemenpreviouslybxelshaderarmearlyͲzcu
hader testsandtobestrstreamedouepointsarehatwerebineamͲoutbuf
gents in that
oftheAgentoundastheworking setasa singleositionissetnormalizedsfragmentsatmaptoahvalue)willn thepixelntBinsthattsmaptoaeprocessed
the currentrkingsetongents ThissagentID isrejectedrdquobylconfigurednting a dualbinnedIDtoallowsustoulling
thepointrsquosreamedoututThepixelwrittenAtnnedduringfferwillnot
Chapter 3 March of the Froblins
61|P a g e
containagentsthatwerebinnedinthepreviousiterationsincetheywillhavebeenmarkedforrejectionduringthevertexshaderrsquosldquogreaterthanrdquotestandthuswillnothavebeenstreamedoutinthegeometryshader This streamͲout buffer becomes the new working set and is used as input for subsequentiterationsofthealgorithmSubsequentpassesfollowmuchliketheseconditerationofthealgorithmEachtimethedepthtargetissettothenextsliceoftheagentIDarraythepixelshaderissettooutputcurrentiterationnumberandanewworkingsetiscreatedforuseinthenextpassEachiterationresultsinareducedworkingsetThealgorithmcontinuesto iterateuntiltheworkingset isreducedtozero Anoverflowconditionoccurs ifthe iteration count reachesorexceeds theAgent IDArraydepthbefore theworking set is reduced tozeroThiscanoccuriftoomanyagentslandinagivenbinbutinpracticeoverflowcanbepreventedbyusinga largeenoughnumberofbinssothatagentsaresufficientlydistributedtoavoidoverflow AlsothedepthoftheAgentIDArraycanbeincreasedtoaccommodatehigherbinloadsApingͲponging technique isused tomanage theworking setbuffers The ldquopingrdquobuffer contains thecurrentworking setandactsas inputduringone iterationwhile the ldquopongrdquobufferactsas theoutputbufferTherolesofthepingandpongbuffersareswappedaftereachiterationTwo techniques are used to avoid CPUGPU synchronizations thatwould result in rendering pipelinestalls Predicated rendering featureofDirect3Dreg10 isused tocontrol theexecutionofeach iterationIdeallythealgorithmshouldonlycontinuetoiterateaslongasthepreviousiterationresultedinastreamͲoutbufferwithnonͲzero length Unfortunately ifwewere tocontrolexecutionon theCPUby issuingGPUqueriesaftereachpasstodetermine ifthealgorithmhadcompletedwewould introducestalls intherenderingpipelineduetoCPUGPUsynchronizationandthiswoulddegradeperformanceToavoidsynchronizationstallsallthedrawcallsforthemaximumnumberofiterations(correspondingtothemaximumallowablebin load)aremadeupfront WestillwantthealgorithmtoterminateonceallagentshavebeenbinnedsoweusepredicateddrawcallstoterminateuponcompletionThedrawcallsforeach iterationarepredicatedon the condition that theprevious iteration resulted inagentsbeingstreamedoutIfnoagentsarestreamedoutthenweknowtheworkingsethasbeenreducedtozeroandwecanterminateUsingcascadingpredicateddrawcallsinthiswaywillresultintheremainingdrawcallsbeingskipped Thus theGPU takes fullresponsibility for terminating thealgorithmonceall theagentshavebeenbinnedTheDirect3Dreg10DrawAutocallisusedtoissueeachpredicateddrawsincewedonotknowthesizeoftheworkingsetfromiterationtoiterationOur spatialdata structureprovides somebenefitsoverprevious techniques [HARADA07] Queryingourdatastructure isefficientbecausewestorebin loads inaBinCounterthusallowingustoonlyreadthenecessarynumberofelementsfromtheAgentIDArrayandevenearlyͲoutwhenabin isdeterminedtobeempty Additionallyour techniqueprovidesamechanism fordetectingoverflowemploys iterativestreamͲoutreductionoftheworkingsetandgivesexecutioncontroltotheGPUtoavoidpipelinestalls
AN
6
3AEssmtftcTf
wadt
FasTtd
AdvancesinReNTatarchuk(E
62|P a g e
3223Ag
AgentrsquosdirecEachagentesolutionWesuitable nummotionfideltodeterminefitnessfunctto collision icircleisequa
The updatedfunctionresu
ݐ ൌ
whereݓisaagentsݐሺሻdirectionstotimeͲdeltasi
Figure 6 Eaand velocitieshown
Twoexampltwo sets aredirectiongi
ealͲTimeRendeEditor)
gentMove
ctionsneedevaluatesaneusedfivedmber of dirityversuspeethetimetionbasedoisdeterminealtothedisc
d velocity (Eult(Equation
ሻሺݏݏ ൌ
ൌ ఢ
ൌ పෝ
aperͲagentሻreturnstheoevaluatencethelast
ach agent eves of agents
esetsofdire identicalWehaveal
eringin3DGra
mentDirec
tochangeanumberoffdirectionsinections forerformancetocollisionwntheangleedbyevalucradiusroft
Equation 7)n6)andthe
ൌݓݐሺሻ
ሺݏݏݐ ሺݏǡ పෝሺݐݏ
factoraffecteminimumtistheglobsimulationf
valuates a fin its curre
rectionstobin the locasochosent
aphicsandGam
ctionDeterm
asaresultoixeddirectioourapplicaour applicaoverheadfowithagentsrelativetotating a swetheagents
is then casmallesttim
ሺ ȉ
ሻ
పሻ Τݐ ሻ
tingthepreftimetocollisbalnavigatioframe
fixed numberent and adja
beevaluatedl coordinatetoevaluate
mesCoursendashS
mination
ofpathfindinonsrelativeationMoreation was dorcomputininthecurrethedesiredgept circleͲcir
lculated basmetocollisio
ሻǤ ͷ Ǥͷ
ferencetomsionwithallondirection
r of potentia
acent bins T
dforvelocitye frame defonlydirectio
IGGRAPH2008
ngand localtothegoalcanbeuseddeterminedngmoredireentoradjacglobaldirectrcle collision
sed on theoninthatdir
moveinthegagentsindiisthespeݏ
al movemenTwo agents a
yupdatearfined by thonsthatwo
8
lavoidancedirectiondedforincreasempiricallyectionsEachcentbinsEationandthen test inwh
direction wrection
globaldirectirectioneedofagent
t directions and their co
eshownine agent andouldcausea
modelsrsquocometerminedbysedmotionfby evaluathdirectioniachdirectioetimetocolhich the rad
with the larg
(5)
(6)
(7)
tionoravoidisthesetotandݐ
based on thorresponding
Figure6Nod its globalldquorightturn
mputationsytheglobalfidelityTheing desiredsevaluatedn isgivenallisionTimediusofeach
gest fitness
dnearbyfdiscreteistheݐ
he positions g V sets are
otethatthe navigationrdquochange in
Chapter 3 March of the Froblins
63|P a g e
orientation This eliminates the need to arbitratewhich direction an agent should turn based on thevelocitiesofotheragentsandalsoresults in fewercollision tests Inpractice this limitation isnotverynoticeableordistractingIt is importanttonotethatweemployasimplekinodynamicconstrainttorestrictanagentrsquoschange invelocityItisdesiredtothrottlethechangeinorientationinagiventimestepbecauseouragentshaveaphysical limit tohow fast they can change theirorientations Thisprevents suddenbroad changes inorientationindenseagentsituations323AgentStateManagementAsagentnavigatearound theenvironment it is important tomaintain informationabout theircurrentstateThis includesdatasuchascurrentpositionandvelocitycurrentgroupIDcurrentanimationcycle(or action) and current time within that animation cycleWe alsomaintain perͲagent data such asmaximumspeedandgoalachievementdistanceGoalachievementdistanceisarandomvalueisusedtodeterminethedistancefromthegoalatwhichanagenthasreachedthatgoalThis isratherspecifictoourspecificgoal typessuchasmushroomandgoldpatcheswhere thegoalhasaspecificareaand theboundariesarenebulous AgentanimationcycletransitionsareperformedusingdynamicflowcontrolwithintheshadercontrollingagentupdatelogicIfthecurrenttimewithinananimationcycleisgreaterthanthelengthofthecurrentanimation several conditions are checked to determinewhat animation cycle should be part of theagentrsquosstateThesearecurrentanimationcycledistancefromnearestgoalnumberofagentsnearbydistancefromfearinducingobstaclessuchastheghostfroblinandnoxiousgascloudsandcurrentgroupWhile these agent updates will not be very coherent the agent data texture is very small and theperformance impact isslightDespitethedimensionalityof inputtotheanimationtransitionfunction itmaybepossible toprecompute theanimation transition logic into textures (asdone in [MHR07])andeliminateflowcontrol
324PathfindingResultsDiscussion
Ourapproachhasbeenusedtosimulate~65000agentsatinteractiveratesonanATIRadeonTMHD4870while performing intensive rendering tasks such as multiͲmillion triangle scene rendering globalillumination approximation atmospheric scattering and highͲquality cascaded shadow mapping Allresultswere collectedon anAMDPhenomtradeX4QuadͲCoreCPU systemwith2GBofRAM and anATIRadeonTM4870graphics cardwith512MBofGDDR5 videomemoryand standardengineandmemoryclocksThemainbottleneck inourapplication is renderingamassivenumberofagents In thecaseofsimulating65Kagentsweuseasimplifiedagentmodel(acylinder)Simulation(globalandlocal)alonefor65K agents on the above system is 45 fps Simulation of crowd behavior and interaction alongwithrendering for this largenumberof agents is 31 fpsNote that evenwithusing a very simple cylindermodelduetoextremelylargenumberofagentswearerendering98MtrianglesinthelattercaseAllofourtestingresultswerecollectedrenderedatHDresolutionwith4XMSAA
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
64|P a g e
WhilewechosetousethetwocrowdbehaviorsimulationtechniquesforourglobalandlocalnavigationtheyalsocanbeusedseparatelyastheyeachhavetheirownadvantagesanddisadvantagesThecontinuumapproachusedforourglobalsolverworkswell intheFroblinsscenariowheretherearelargenumbersofagentswithonlya fewtypesofgoalsMorecomplexscenarioswithdiversetasksarelikely to be incompatible with this type of approach Another disadvantage of using a continuumapproach is that all agents aremodeled as having global knowledge An agentwillmake navigationchoicesbasedonobstacles thatarenotvisible to itwhichmayalsonotbedesiredWe feel that thecontinuum approach would be excellent for ambient crowds That is large groups of nonͲplayercharacters which are present mostly for scenery but are expected to navigate around a dynamicenvironmentormovingcharactersThelocalmodelalsohasdisadvantagesBylimitinglocalavoidancevelocitiestobeofaclockwisenaturethevariation inagent interaction issomewhat limitedUsingasmalldiscretesetof localdirections fornavigationcanalso lead tooscillationbetween twodirectionscreatingdistractingbehaviorWhile thelocal avoidancemodelwill prevent collisions in very densely packed situations scenarios arisewhereagentscandeadlockandwillbecomestuckThistypicallyhappensatsinksinagentnavigationssuchasatasmallgoalOnceagentsbecomedenselypackedaroundagoalagentsthatreachthegoalwillbeunableto navigate out of the goal area This could be solved by incorporating varying levels of aggressivebehavior into agentmovement that causes agents to push each other out of theway This type ofapproach couldbeaugmentedbya compositeagentapproach [YCP08] to ldquotrailͲblazerdquopaths throughdenselypackedagents
33CharacterLODManagementTheoverarchinggoalofoursystemissimulationandrenderingofmassivecrowdsofcharacterswithhighlevelofdetailThe latestgenerationsofcommodityGPUsdemonstrate incredible increases ingeometryperformanceespeciallywiththe inclusionofGPUtessellationpipeline(Section35)Neverthelessevenwith stateͲofͲtheͲartgraphicshardware renderingmultiple thousandsofcomplexcharacterswithhighpolygonalcountsatinteractiveratesisverytaxingRenderingthousandsofcharacterswithoveramillionofpolygonseachisneitherpracticalnorwiseasinmanycasesthesecharactersmaybeverysmallonthescreen and therefore performance is wasted on the details that go unnoticed For this reason it isessential to use culling and level of detail (LOD) techniques in order tomake this rendering problemtractableCullingandLODmanagementhavetraditionallybeenCPUͲcentrictaskstradingamodestamountofCPUoverheadforamuch largerreduction intheGPUworkload HoweveracommondifficultyariseswhenthepositionaldataisgeneratedbyaGPUͲbasedsimulationandthereforewouldrequireacostlyreadͲbackoperationforCPUͲsidescenemanagementThis isexactlythesituationwersquoveencounteredaswesimulatedandanimatedourcharactersentirelyontheGPUThealternativeistousetheavailablecomputetoperformallcullingandscenemanagementdirectlyontheGPUInourFroblinsdemowesolvethisproblembyemployingDirect3Dreg10geometryshadersina
Chapter 3 March of the Froblins
65|P a g e
novelwaytoperformcharactercullingandLODsortingentirelyontheGPUThisenablesustoperformthese tasksefficiently forGPUͲsimulatedcharactersTheunderlying ideascouldalsobeappliedwithaCPUsimulationinordertooffloadthescenemanagementfromtheCPU
331UsingStreamͲOutOperationsasFilteringOursystem takesadvantageof instancingsupportavailablewithDirect3Dreg10andDirect3Dreg101APIWerenderanarmyofcharactersasvaried instancedcharacterswith individualactionsandanimationscontrolledontheGPUThisnaturallyleadstothekeyideabehindourscenemanagementapproachtheuseofgeometryshadersthatactas filters forasetofcharacter instances A filteringshaderworksbytaking a set of point primitives as inputwhere each point contains the perͲinstance data needed torenderagivencharacter(positionorientationandanimationstate) ThefilteringshaderreͲemitsonlythosepointswhichpassaparticulartestwhilediscardingtherestTheemittedpointsarestreamedintoa bufferwhich can then be reͲbound as instance data and used to render the characters MultiplefilteringpassescanbechainedtogetherbyusingsuccessiveDrawAutocallsanddifferenttestscanbesetupsimplybyusingdifferentshadersInpracticeweuseasharedgeometryshadertoperformtheactualfilteringandperformthedifferentfilteringtestsinvertexshadersAsidefromprovidingmoremodularcodethisapproachcanalsoprovideperformancebenefitsThesourcetothisfilteringgeometryshaderisshowninListing1
struct GSInput XYZ contain the characterrsquos origin in world space W contains a group number which is used to vary character appearance float4 vPositionAndGroup PositionAndGroup XY contain the character orientation (a vector in the XZ plane) Z contains the index of the characterrsquos animation cycle W contains the time along the cycle (see section 35) float4 vDirection DirectionStateAndTime Result of predicate test 1 == emit 0 == do not emit float fResult TestResult struct GSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime [maxvertexcount(1)] void main( point GSInput vert[1] inout PointStreamltGSOutputgt outputStream ) [branch] if( vert[0]fResult == 1 ) GSOutput o ovPositionAndGroup = vert[0]vPositionAndGroup ovDirection = vert[0]vDirection outputStreamAppend( o )
Listing 1 A stream-filtering geometry shader
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
66|P a g e
332ViewͲFrustumCullingIt is straightforward to perform view frustum culling using a filtering geometry shader as describedabove ForviewͲfrustumcullingthevertexshadersimplyperformsan intersectioncheckbetweenthecharacterboundingvolumeandtheviewfrustumusingtheusualalgorithms(forexample[AHH08])Ifthe testpasses then the corresponding character is visible and its instancedata isemitted from thegeometryshaderandstreamedoutOtherwiseitisdiscardedAnexampleofacullingvertexshaderisgiveninListing2
struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirectionStateAndTime DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible TestResult Computes signed distance between a point and a plane vPlane Contains plane coefficients (abcd) where ax + by + cz = d vPoint Point to be tested against the plane float DistanceToPlane( float4 vPlane float3 vPoint ) return dot( float4( vPoint -1 ) vPlane ) Frustum cullling on a sphere Returns 1 if visible 0 otherwise float CullSphere( float4 vPlanes[6] float3 vCenter float fRadius ) for( uint i=0 ilt6 i++ ) entire sphere is outside one of the six planes cull immediately if( DistanceToPlane( vPlanes[i] vCenter ) gt fRadius ) return 0 return 1 float4 vFrustumPlanes[6] view-frustum planes in world space (normals face out) float3 vSphereCenter bounding sphere center relative to character origin float fSphereRadius bounding sphere radius VSOutput VS( VSInput i ) compute bounding sphere center in world space float3 vObjectPosWS = ivPositionAndGroupxyz float3 vSphereCenterWS = vBoundingSphereCenterxyz + vObjectPosWS perform view-frustum test float fVisible = CullSphere( vFrustumPlanes vSphereCenterWS fSphereRadius ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionStateAndTime = ivDirectionStateAndTime ofVisible = fVisible return o
Listing 2 Vertex shader for view-frustum culling
Chapter 3 March of the Froblins
67|P a g e
333OcclusionCullingWe can also perform occlusion culling in this framework to avoid rendering characters which arecompletelyoccludedbymountainsorstructuresBecauseweareperformingourcharactermanagementontheGPUweareabletoperformocclusionculling inanovelwaybytakingadvantageofthedepthinformationthatexists inthehardwareZbuffer Thisapproachrequiresfar lessCPUoverheadthananapproachbasedonpredicatedrenderingorocclusionquerieswhilestillallowingcullingagainstarbitrarydynamicoccluders Ourapproach issimilar inspirittothehierarchicalZtestingthat is implemented inmodernGPUsandwasinspiredbytheworkof[GKM93]whousedahierarchicaldepthimagecombinedwithanoctreetoculloccludedgeometryinbulkAfter rendering allof theoccluders in the scenewe construct ahierarchicaldepth image from the Zbufferwhichwewill refer toasaHiͲZmap TheHiͲZmap isamipͲmappedscreenͲresolution imagewhereeachtexelinmiplevelicontainsthemaximumdepthofallcorrespondingtexelsinmipleveliͲ1InthemostdetailedmipleveleachtexelsimplycontainsthecorrespondingdepthvaluefromtheZbufferThis depth information can be collected during themain rendering pass for the occluding objects aseparatedepthpassisnotrequiredtobuildtheHiͲZmapAfter construction of the HiͲZ map occlusion culling can be performed by examining the depthinformation forpixelswhicharecoveredbyanobjectrsquosboundingsphereandcomparingthemaximumfetcheddepthtotheprojecteddepthofapointonthespherethat isnearesttothecamera Althoughthisapproachdoesnotprovideanexactocclusiontestitgivesaconservativeestimatethatworkswellinmanycasesandwillneverresultinfalseculling3331HiͲZMapConstructionForsingleͲsamplerenderingonecanusetheHiͲZmapasthemaindepthbufferforrenderingthescene(usingaDepthStencilviewofthefirstmiplevel)InDirect3Dreg101multiͲsampleddepthbufferscanalsobesupportedwithanextrafullͲscreenquadpassbyfirstcomputingthemaximumdepthofeachpixelrsquossubͲsamplesandstoringtheresultinthelowestleveloftheHiͲZmapSubsequentlevelsaregeneratedusingasequenceofreductionpasseswhichrepeatedlyfetchtexelsandcomputetheirmaximumasshown inListing3 BecausescreenͲsized imagestypicallydonotmipwellcaremust be taken when reducing oddͲsizedmip levels In this case the pixels on the oddͲsizedboundaryedgemustfetchadditionaltexelstoensurethattheirdepthvaluesaretakenintoaccountInaddition it isnecessarytouse integercalculations forthetextureaddressarithmeticbecause floatingͲpointerrorcanresultinincorrectaddressingwhenrenderingintothelowermiplevelsEachofthereductionpassesrenders intoonemip leveloftheHiͲZmapresourcewhilesamplingfromthepreviousoneThisisvalidapproachinDirect3Dreg10aslongastheresourceviewusedfortheinputmip level does not overlap the one being used for output (different input and output viewsmust becreatedforeachpass)
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
68|P a g e
struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM
Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control
Chapter 3 March of the Froblins
69|P a g e
betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula
ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)
HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ
ௗ ୲ୟ୬ቀഇమቁ (9)
whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat
ቀௐଶቁ ͳ (10)
wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
70|P a g e
ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD
Chapter 3 March of the Froblins
71|P a g e
float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o
Listing 4 Vertex shader for per-instance occlusion culling
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
72|P a g e
335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands
RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )
Listing 5 Summary of our character management system
C
3 TcsftctmOdttbaipo
FccoadTmaofobt
Chapter 3 Ma
34 Cha
The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea
Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation
Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu
The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon
arch of the Fr
racterAn
nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa
canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in
xample of difgic shader
e and desired(b) User pl
we see the cushrooms (d)
of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence
oblins
nimation
h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto
ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp
ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe
mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes
ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU
redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th
ons that our determines tom the left ious poison ng away fro
eaceful restin
is illustratewherethehober isusedtouse the teeanimationweight annotableperfo
med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour
ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor
Froblins cathe current a(a) Froblin cloud in the
om the hazarng restores t
ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai
dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes
kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res
an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo
e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto
s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr
miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi
These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri
ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic
7
ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou
c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p
ns are controsed on each ed treasure tas a result bout to munts
ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo
73|P a g e
mationsandonbyvertexkthebonesfcharacters
merousdrawnd33)theursystemby
fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and
olled by the characterrsquos to the drop-they scatter
nch on some
ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
74|P a g e
Figure 8 Animation texture layout
Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss
float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering
Bone 1
Matrix Row 0
Matrix Row 1
Matrix Row 2
Bone 0
Matrix Row 0
Matrix Row 1
Matrix Row 2
Key 0 Key1 Key N
RGBA FP16
Chapter 3 March of the Froblins
75|P a g e
void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)
Listing 6 Shader code to fetch interpolate and blend bone animations
AN
7
3
F(st
Rsat(stfs
T
AdvancesinReNTatarchuk(E
76|P a g e
35 Tess
Figure 9 Ou(left) On theshaders and the tessellate
Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder
FroblLowr
Zbrushreghighr
Table 1 Com
ealͲTimeRendeEditor)
sellation
ur system alle right the textures W
ed character
erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof
lincontrolcagesolutionmod
resolutionFro
mparison of
eringin3DGra
andCrow
lows rendersame chara
While using thr on the left
GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste
edel
blinmodel
memory foo
aphicsandGam
wdRende
ing characteacter is rendhe same memwhereas the
cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo
tprint for hig
mesCoursendashS
ering
ers with extrdered withomory footprie low resolut
asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke
Polygons
5160faces
15+Mfaces
gh and low r
IGGRAPH2008
reme details ut the use oint we are ation characte
60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka
resolution m
8
in close-up of tessellatioable to add er has much
RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf
Vertexa
2Ktimes2K16b
Vert
Ind
models for ou
when using on using idehigh level ofcoarser silh
D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption
TotalMemory
ndindexbuffe
itdisplacemen
texbuffer~27
dexBuffer180
ur main char
tessellationentical pixel f details for
houettes
0and4000fied shaderdhardwareirect3Dreg11universally
ssellationasseauthoredormanceA
y
ers100KB
ntmap10MB
70MB
0MB
racter
C
Fc
Chapter 3 Ma
Figure 10 Ccharacter
arch of the Fr
Comparison
oblins
of low resolution modeel (top) and hhigh resoluttion model (
7
(bottom) for
77|P a g e
the Froblin
AN
7
Hvtcodwotddc[
3
Ihho
F(vd
Gt1g
AdvancesinReNTatarchuk(E
78|P a g e
Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08
351 GPU
n this sectiohardwareashardware fixoutlinedinF
Figure 11 A(also referrevertices thusdisplacemen
Goingthrougtakesaninp15X times fgenerationo
CoarseS
ealͲTimeRendeEditor)
essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]
UTessellati
onwewillpsusedinouxedͲfunctionigure11
An overviewed to as the s amplifyingt obtaining
ghtheproceutprimitiveor Xboxreg 3ofgraphicsc
SuperͲPrimMesh
eringin3DGra
provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional
ionPipelin
provideanorsystemWn tessellator
w of the tesseldquocontrol cagg the input mthe final tess
essforasing(whichwe60 or ATI Rcards)Aver
Tessella
aphicsandGam
veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of
ne
overviewofWedesigned
unitavailab
ellation procgerdquo or ldquothe smesh The vesellated and
gleinputpolrefertoasaRadeonreg HDrtexshader
ator
mesCoursendashS
nefits speciis effective
ologyparamowamountorenderingooverallamodisplaceme
owsustoreddvideomemhe memoryrce Table 1f the benef
GPU tesselanAPIforableonrecen
cess We stasuper-primitertex shader
d displaced h
ygonwehaasuperͲprimD 2000Ͳ4000(whichwer
Tessellated Mesh
IGGRAPH2008
fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are
1demonstrafits provided
lationpipeliaGPUtesselntconsumer
art by rendertive meshrdquo)r is used to high resolutio
avethefollomitive)anda0 GPU generefertoasa
h
Di
8
al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell
ineavailablelationpipelirGPUsThe
ring a coars The tessellaevaluate suron mesh see
wingthehaamplifiesiterations ornevaluation
Evaluatesurfacepositions
Addsplacement
active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw
ducingtheoy relevant fy savings foation can b
eoncurreninetakingade tessellation
se low resoator unit genrface position on the righ
ardwaretess(upto411t64X for Di
nshader)is
TessellatedanMes
ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor
widthThisisverallgamefor consoleorourmainbe found in
tconsumerdvantageofnprocess is
lution mesh nerates new ons and add ht
sellatorunittrianglesorrect3Dreg 11invokedfor
dDisplacedsh
d
Chapter 3 March of the Froblins
79|P a g e
eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
80|P a g e
struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS
C
L GTHsc
Fdb Hnawddeae
Chapter 3 Ma
f f f f o o o o o r
Listing 7 Ex
GiventhatwTraditionallyHoweverthspacenormachangingthe
Figure 12 Ddisplaced in be shaded us
Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese
arch of the Fr
Convert po translatin somewhat f
float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor
ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS
return o
xample simp
wearerendey animatedereexistsaalmapsInteactualdisp
Displacementhe directio
sing normal
e intend toordertocommely thatwo generatentandnormantmapmustthe tangen
MDGPUMeserequireme
oblins
osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota
S = mul( mVPS = float4( v = vNormalWS = vTangentW
S = vBinormal
le evaluation
eringourchcharactersconcernwithatcasewelacednorma
nt of the veon of geometԢ
light thedismbineTSNMwearecomthe displa
almapsalsotbe generantͲspace norhMappertontsaremet
tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir
float4( vPovPositionWS S WS lWS
n shader for
aracterwithare rendehregardstoeareessental(asshown
rtex modifietric normal
splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr
e from objectrotating abo
vPositionOS vNormalOS )vTangentOS )vBinormalOS
ositionWS 1 1 )
r rendering i
hdisplacemered and litousingdisplatiallygeneratninFigure12
es the norma displacem
faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour
t space to woout y we can
) + vInstanc ) )
) )
nstanced tes
entmapweusing tang
acementmatinganewt2)
al used for ment amount
angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor
orld space byn simplify th
cePosWS
ssellated cha
emustmakegentͲspaceappingwhentangentfram
rendering
t D The resu
cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa
8
y rotating anhe math
aracters
eanoteabonormal manlightingusimeasweare
P is the oriulting point
mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat
81|P a g e
nd
outlightingps (TSNM)ngtangentͲedisplacing
iginal point Prsquo needs to
heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan
t
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
82|P a g e
353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows
ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ
HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive
353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed
Chapter 3 March of the Froblins
83|P a g e
vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
61|P a g e
containagentsthatwerebinnedinthepreviousiterationsincetheywillhavebeenmarkedforrejectionduringthevertexshaderrsquosldquogreaterthanrdquotestandthuswillnothavebeenstreamedoutinthegeometryshader This streamͲout buffer becomes the new working set and is used as input for subsequentiterationsofthealgorithmSubsequentpassesfollowmuchliketheseconditerationofthealgorithmEachtimethedepthtargetissettothenextsliceoftheagentIDarraythepixelshaderissettooutputcurrentiterationnumberandanewworkingsetiscreatedforuseinthenextpassEachiterationresultsinareducedworkingsetThealgorithmcontinuesto iterateuntiltheworkingset isreducedtozero Anoverflowconditionoccurs ifthe iteration count reachesorexceeds theAgent IDArraydepthbefore theworking set is reduced tozeroThiscanoccuriftoomanyagentslandinagivenbinbutinpracticeoverflowcanbepreventedbyusinga largeenoughnumberofbinssothatagentsaresufficientlydistributedtoavoidoverflow AlsothedepthoftheAgentIDArraycanbeincreasedtoaccommodatehigherbinloadsApingͲponging technique isused tomanage theworking setbuffers The ldquopingrdquobuffer contains thecurrentworking setandactsas inputduringone iterationwhile the ldquopongrdquobufferactsas theoutputbufferTherolesofthepingandpongbuffersareswappedaftereachiterationTwo techniques are used to avoid CPUGPU synchronizations thatwould result in rendering pipelinestalls Predicated rendering featureofDirect3Dreg10 isused tocontrol theexecutionofeach iterationIdeallythealgorithmshouldonlycontinuetoiterateaslongasthepreviousiterationresultedinastreamͲoutbufferwithnonͲzero length Unfortunately ifwewere tocontrolexecutionon theCPUby issuingGPUqueriesaftereachpasstodetermine ifthealgorithmhadcompletedwewould introducestalls intherenderingpipelineduetoCPUGPUsynchronizationandthiswoulddegradeperformanceToavoidsynchronizationstallsallthedrawcallsforthemaximumnumberofiterations(correspondingtothemaximumallowablebin load)aremadeupfront WestillwantthealgorithmtoterminateonceallagentshavebeenbinnedsoweusepredicateddrawcallstoterminateuponcompletionThedrawcallsforeach iterationarepredicatedon the condition that theprevious iteration resulted inagentsbeingstreamedoutIfnoagentsarestreamedoutthenweknowtheworkingsethasbeenreducedtozeroandwecanterminateUsingcascadingpredicateddrawcallsinthiswaywillresultintheremainingdrawcallsbeingskipped Thus theGPU takes fullresponsibility for terminating thealgorithmonceall theagentshavebeenbinnedTheDirect3Dreg10DrawAutocallisusedtoissueeachpredicateddrawsincewedonotknowthesizeoftheworkingsetfromiterationtoiterationOur spatialdata structureprovides somebenefitsoverprevious techniques [HARADA07] Queryingourdatastructure isefficientbecausewestorebin loads inaBinCounterthusallowingustoonlyreadthenecessarynumberofelementsfromtheAgentIDArrayandevenearlyͲoutwhenabin isdeterminedtobeempty Additionallyour techniqueprovidesamechanism fordetectingoverflowemploys iterativestreamͲoutreductionoftheworkingsetandgivesexecutioncontroltotheGPUtoavoidpipelinestalls
AN
6
3AEssmtftcTf
wadt
FasTtd
AdvancesinReNTatarchuk(E
62|P a g e
3223Ag
AgentrsquosdirecEachagentesolutionWesuitable nummotionfideltodeterminefitnessfunctto collision icircleisequa
The updatedfunctionresu
ݐ ൌ
whereݓisaagentsݐሺሻdirectionstotimeͲdeltasi
Figure 6 Eaand velocitieshown
Twoexampltwo sets aredirectiongi
ealͲTimeRendeEditor)
gentMove
ctionsneedevaluatesaneusedfivedmber of dirityversuspeethetimetionbasedoisdeterminealtothedisc
d velocity (Eult(Equation
ሻሺݏݏ ൌ
ൌ ఢ
ൌ పෝ
aperͲagentሻreturnstheoevaluatencethelast
ach agent eves of agents
esetsofdire identicalWehaveal
eringin3DGra
mentDirec
tochangeanumberoffdirectionsinections forerformancetocollisionwntheangleedbyevalucradiusroft
Equation 7)n6)andthe
ൌݓݐሺሻ
ሺݏݏݐ ሺݏǡ పෝሺݐݏ
factoraffecteminimumtistheglobsimulationf
valuates a fin its curre
rectionstobin the locasochosent
aphicsandGam
ctionDeterm
asaresultoixeddirectioourapplicaour applicaoverheadfowithagentsrelativetotating a swetheagents
is then casmallesttim
ሺ ȉ
ሻ
పሻ Τݐ ሻ
tingthepreftimetocollisbalnavigatioframe
fixed numberent and adja
beevaluatedl coordinatetoevaluate
mesCoursendashS
mination
ofpathfindinonsrelativeationMoreation was dorcomputininthecurrethedesiredgept circleͲcir
lculated basmetocollisio
ሻǤ ͷ Ǥͷ
ferencetomsionwithallondirection
r of potentia
acent bins T
dforvelocitye frame defonlydirectio
IGGRAPH2008
ngand localtothegoalcanbeuseddeterminedngmoredireentoradjacglobaldirectrcle collision
sed on theoninthatdir
moveinthegagentsindiisthespeݏ
al movemenTwo agents a
yupdatearfined by thonsthatwo
8
lavoidancedirectiondedforincreasempiricallyectionsEachcentbinsEationandthen test inwh
direction wrection
globaldirectirectioneedofagent
t directions and their co
eshownine agent andouldcausea
modelsrsquocometerminedbysedmotionfby evaluathdirectioniachdirectioetimetocolhich the rad
with the larg
(5)
(6)
(7)
tionoravoidisthesetotandݐ
based on thorresponding
Figure6Nod its globalldquorightturn
mputationsytheglobalfidelityTheing desiredsevaluatedn isgivenallisionTimediusofeach
gest fitness
dnearbyfdiscreteistheݐ
he positions g V sets are
otethatthe navigationrdquochange in
Chapter 3 March of the Froblins
63|P a g e
orientation This eliminates the need to arbitratewhich direction an agent should turn based on thevelocitiesofotheragentsandalsoresults in fewercollision tests Inpractice this limitation isnotverynoticeableordistractingIt is importanttonotethatweemployasimplekinodynamicconstrainttorestrictanagentrsquoschange invelocityItisdesiredtothrottlethechangeinorientationinagiventimestepbecauseouragentshaveaphysical limit tohow fast they can change theirorientations Thisprevents suddenbroad changes inorientationindenseagentsituations323AgentStateManagementAsagentnavigatearound theenvironment it is important tomaintain informationabout theircurrentstateThis includesdatasuchascurrentpositionandvelocitycurrentgroupIDcurrentanimationcycle(or action) and current time within that animation cycleWe alsomaintain perͲagent data such asmaximumspeedandgoalachievementdistanceGoalachievementdistanceisarandomvalueisusedtodeterminethedistancefromthegoalatwhichanagenthasreachedthatgoalThis isratherspecifictoourspecificgoal typessuchasmushroomandgoldpatcheswhere thegoalhasaspecificareaand theboundariesarenebulous AgentanimationcycletransitionsareperformedusingdynamicflowcontrolwithintheshadercontrollingagentupdatelogicIfthecurrenttimewithinananimationcycleisgreaterthanthelengthofthecurrentanimation several conditions are checked to determinewhat animation cycle should be part of theagentrsquosstateThesearecurrentanimationcycledistancefromnearestgoalnumberofagentsnearbydistancefromfearinducingobstaclessuchastheghostfroblinandnoxiousgascloudsandcurrentgroupWhile these agent updates will not be very coherent the agent data texture is very small and theperformance impact isslightDespitethedimensionalityof inputtotheanimationtransitionfunction itmaybepossible toprecompute theanimation transition logic into textures (asdone in [MHR07])andeliminateflowcontrol
324PathfindingResultsDiscussion
Ourapproachhasbeenusedtosimulate~65000agentsatinteractiveratesonanATIRadeonTMHD4870while performing intensive rendering tasks such as multiͲmillion triangle scene rendering globalillumination approximation atmospheric scattering and highͲquality cascaded shadow mapping Allresultswere collectedon anAMDPhenomtradeX4QuadͲCoreCPU systemwith2GBofRAM and anATIRadeonTM4870graphics cardwith512MBofGDDR5 videomemoryand standardengineandmemoryclocksThemainbottleneck inourapplication is renderingamassivenumberofagents In thecaseofsimulating65Kagentsweuseasimplifiedagentmodel(acylinder)Simulation(globalandlocal)alonefor65K agents on the above system is 45 fps Simulation of crowd behavior and interaction alongwithrendering for this largenumberof agents is 31 fpsNote that evenwithusing a very simple cylindermodelduetoextremelylargenumberofagentswearerendering98MtrianglesinthelattercaseAllofourtestingresultswerecollectedrenderedatHDresolutionwith4XMSAA
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
64|P a g e
WhilewechosetousethetwocrowdbehaviorsimulationtechniquesforourglobalandlocalnavigationtheyalsocanbeusedseparatelyastheyeachhavetheirownadvantagesanddisadvantagesThecontinuumapproachusedforourglobalsolverworkswell intheFroblinsscenariowheretherearelargenumbersofagentswithonlya fewtypesofgoalsMorecomplexscenarioswithdiversetasksarelikely to be incompatible with this type of approach Another disadvantage of using a continuumapproach is that all agents aremodeled as having global knowledge An agentwillmake navigationchoicesbasedonobstacles thatarenotvisible to itwhichmayalsonotbedesiredWe feel that thecontinuum approach would be excellent for ambient crowds That is large groups of nonͲplayercharacters which are present mostly for scenery but are expected to navigate around a dynamicenvironmentormovingcharactersThelocalmodelalsohasdisadvantagesBylimitinglocalavoidancevelocitiestobeofaclockwisenaturethevariation inagent interaction issomewhat limitedUsingasmalldiscretesetof localdirections fornavigationcanalso lead tooscillationbetween twodirectionscreatingdistractingbehaviorWhile thelocal avoidancemodelwill prevent collisions in very densely packed situations scenarios arisewhereagentscandeadlockandwillbecomestuckThistypicallyhappensatsinksinagentnavigationssuchasatasmallgoalOnceagentsbecomedenselypackedaroundagoalagentsthatreachthegoalwillbeunableto navigate out of the goal area This could be solved by incorporating varying levels of aggressivebehavior into agentmovement that causes agents to push each other out of theway This type ofapproach couldbeaugmentedbya compositeagentapproach [YCP08] to ldquotrailͲblazerdquopaths throughdenselypackedagents
33CharacterLODManagementTheoverarchinggoalofoursystemissimulationandrenderingofmassivecrowdsofcharacterswithhighlevelofdetailThe latestgenerationsofcommodityGPUsdemonstrate incredible increases ingeometryperformanceespeciallywiththe inclusionofGPUtessellationpipeline(Section35)Neverthelessevenwith stateͲofͲtheͲartgraphicshardware renderingmultiple thousandsofcomplexcharacterswithhighpolygonalcountsatinteractiveratesisverytaxingRenderingthousandsofcharacterswithoveramillionofpolygonseachisneitherpracticalnorwiseasinmanycasesthesecharactersmaybeverysmallonthescreen and therefore performance is wasted on the details that go unnoticed For this reason it isessential to use culling and level of detail (LOD) techniques in order tomake this rendering problemtractableCullingandLODmanagementhavetraditionallybeenCPUͲcentrictaskstradingamodestamountofCPUoverheadforamuch largerreduction intheGPUworkload HoweveracommondifficultyariseswhenthepositionaldataisgeneratedbyaGPUͲbasedsimulationandthereforewouldrequireacostlyreadͲbackoperationforCPUͲsidescenemanagementThis isexactlythesituationwersquoveencounteredaswesimulatedandanimatedourcharactersentirelyontheGPUThealternativeistousetheavailablecomputetoperformallcullingandscenemanagementdirectlyontheGPUInourFroblinsdemowesolvethisproblembyemployingDirect3Dreg10geometryshadersina
Chapter 3 March of the Froblins
65|P a g e
novelwaytoperformcharactercullingandLODsortingentirelyontheGPUThisenablesustoperformthese tasksefficiently forGPUͲsimulatedcharactersTheunderlying ideascouldalsobeappliedwithaCPUsimulationinordertooffloadthescenemanagementfromtheCPU
331UsingStreamͲOutOperationsasFilteringOursystem takesadvantageof instancingsupportavailablewithDirect3Dreg10andDirect3Dreg101APIWerenderanarmyofcharactersasvaried instancedcharacterswith individualactionsandanimationscontrolledontheGPUThisnaturallyleadstothekeyideabehindourscenemanagementapproachtheuseofgeometryshadersthatactas filters forasetofcharacter instances A filteringshaderworksbytaking a set of point primitives as inputwhere each point contains the perͲinstance data needed torenderagivencharacter(positionorientationandanimationstate) ThefilteringshaderreͲemitsonlythosepointswhichpassaparticulartestwhilediscardingtherestTheemittedpointsarestreamedintoa bufferwhich can then be reͲbound as instance data and used to render the characters MultiplefilteringpassescanbechainedtogetherbyusingsuccessiveDrawAutocallsanddifferenttestscanbesetupsimplybyusingdifferentshadersInpracticeweuseasharedgeometryshadertoperformtheactualfilteringandperformthedifferentfilteringtestsinvertexshadersAsidefromprovidingmoremodularcodethisapproachcanalsoprovideperformancebenefitsThesourcetothisfilteringgeometryshaderisshowninListing1
struct GSInput XYZ contain the characterrsquos origin in world space W contains a group number which is used to vary character appearance float4 vPositionAndGroup PositionAndGroup XY contain the character orientation (a vector in the XZ plane) Z contains the index of the characterrsquos animation cycle W contains the time along the cycle (see section 35) float4 vDirection DirectionStateAndTime Result of predicate test 1 == emit 0 == do not emit float fResult TestResult struct GSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime [maxvertexcount(1)] void main( point GSInput vert[1] inout PointStreamltGSOutputgt outputStream ) [branch] if( vert[0]fResult == 1 ) GSOutput o ovPositionAndGroup = vert[0]vPositionAndGroup ovDirection = vert[0]vDirection outputStreamAppend( o )
Listing 1 A stream-filtering geometry shader
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
66|P a g e
332ViewͲFrustumCullingIt is straightforward to perform view frustum culling using a filtering geometry shader as describedabove ForviewͲfrustumcullingthevertexshadersimplyperformsan intersectioncheckbetweenthecharacterboundingvolumeandtheviewfrustumusingtheusualalgorithms(forexample[AHH08])Ifthe testpasses then the corresponding character is visible and its instancedata isemitted from thegeometryshaderandstreamedoutOtherwiseitisdiscardedAnexampleofacullingvertexshaderisgiveninListing2
struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirectionStateAndTime DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible TestResult Computes signed distance between a point and a plane vPlane Contains plane coefficients (abcd) where ax + by + cz = d vPoint Point to be tested against the plane float DistanceToPlane( float4 vPlane float3 vPoint ) return dot( float4( vPoint -1 ) vPlane ) Frustum cullling on a sphere Returns 1 if visible 0 otherwise float CullSphere( float4 vPlanes[6] float3 vCenter float fRadius ) for( uint i=0 ilt6 i++ ) entire sphere is outside one of the six planes cull immediately if( DistanceToPlane( vPlanes[i] vCenter ) gt fRadius ) return 0 return 1 float4 vFrustumPlanes[6] view-frustum planes in world space (normals face out) float3 vSphereCenter bounding sphere center relative to character origin float fSphereRadius bounding sphere radius VSOutput VS( VSInput i ) compute bounding sphere center in world space float3 vObjectPosWS = ivPositionAndGroupxyz float3 vSphereCenterWS = vBoundingSphereCenterxyz + vObjectPosWS perform view-frustum test float fVisible = CullSphere( vFrustumPlanes vSphereCenterWS fSphereRadius ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionStateAndTime = ivDirectionStateAndTime ofVisible = fVisible return o
Listing 2 Vertex shader for view-frustum culling
Chapter 3 March of the Froblins
67|P a g e
333OcclusionCullingWe can also perform occlusion culling in this framework to avoid rendering characters which arecompletelyoccludedbymountainsorstructuresBecauseweareperformingourcharactermanagementontheGPUweareabletoperformocclusionculling inanovelwaybytakingadvantageofthedepthinformationthatexists inthehardwareZbuffer Thisapproachrequiresfar lessCPUoverheadthananapproachbasedonpredicatedrenderingorocclusionquerieswhilestillallowingcullingagainstarbitrarydynamicoccluders Ourapproach issimilar inspirittothehierarchicalZtestingthat is implemented inmodernGPUsandwasinspiredbytheworkof[GKM93]whousedahierarchicaldepthimagecombinedwithanoctreetoculloccludedgeometryinbulkAfter rendering allof theoccluders in the scenewe construct ahierarchicaldepth image from the Zbufferwhichwewill refer toasaHiͲZmap TheHiͲZmap isamipͲmappedscreenͲresolution imagewhereeachtexelinmiplevelicontainsthemaximumdepthofallcorrespondingtexelsinmipleveliͲ1InthemostdetailedmipleveleachtexelsimplycontainsthecorrespondingdepthvaluefromtheZbufferThis depth information can be collected during themain rendering pass for the occluding objects aseparatedepthpassisnotrequiredtobuildtheHiͲZmapAfter construction of the HiͲZ map occlusion culling can be performed by examining the depthinformation forpixelswhicharecoveredbyanobjectrsquosboundingsphereandcomparingthemaximumfetcheddepthtotheprojecteddepthofapointonthespherethat isnearesttothecamera Althoughthisapproachdoesnotprovideanexactocclusiontestitgivesaconservativeestimatethatworkswellinmanycasesandwillneverresultinfalseculling3331HiͲZMapConstructionForsingleͲsamplerenderingonecanusetheHiͲZmapasthemaindepthbufferforrenderingthescene(usingaDepthStencilviewofthefirstmiplevel)InDirect3Dreg101multiͲsampleddepthbufferscanalsobesupportedwithanextrafullͲscreenquadpassbyfirstcomputingthemaximumdepthofeachpixelrsquossubͲsamplesandstoringtheresultinthelowestleveloftheHiͲZmapSubsequentlevelsaregeneratedusingasequenceofreductionpasseswhichrepeatedlyfetchtexelsandcomputetheirmaximumasshown inListing3 BecausescreenͲsized imagestypicallydonotmipwellcaremust be taken when reducing oddͲsizedmip levels In this case the pixels on the oddͲsizedboundaryedgemustfetchadditionaltexelstoensurethattheirdepthvaluesaretakenintoaccountInaddition it isnecessarytouse integercalculations forthetextureaddressarithmeticbecause floatingͲpointerrorcanresultinincorrectaddressingwhenrenderingintothelowermiplevelsEachofthereductionpassesrenders intoonemip leveloftheHiͲZmapresourcewhilesamplingfromthepreviousoneThisisvalidapproachinDirect3Dreg10aslongastheresourceviewusedfortheinputmip level does not overlap the one being used for output (different input and output viewsmust becreatedforeachpass)
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
68|P a g e
struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM
Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control
Chapter 3 March of the Froblins
69|P a g e
betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula
ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)
HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ
ௗ ୲ୟ୬ቀഇమቁ (9)
whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat
ቀௐଶቁ ͳ (10)
wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
70|P a g e
ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD
Chapter 3 March of the Froblins
71|P a g e
float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o
Listing 4 Vertex shader for per-instance occlusion culling
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
72|P a g e
335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands
RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )
Listing 5 Summary of our character management system
C
3 TcsftctmOdttbaipo
FccoadTmaofobt
Chapter 3 Ma
34 Cha
The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea
Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation
Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu
The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon
arch of the Fr
racterAn
nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa
canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in
xample of difgic shader
e and desired(b) User pl
we see the cushrooms (d)
of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence
oblins
nimation
h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto
ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp
ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe
mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes
ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU
redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th
ons that our determines tom the left ious poison ng away fro
eaceful restin
is illustratewherethehober isusedtouse the teeanimationweight annotableperfo
med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour
ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor
Froblins cathe current a(a) Froblin cloud in the
om the hazarng restores t
ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai
dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes
kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res
an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo
e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto
s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr
miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi
These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri
ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic
7
ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou
c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p
ns are controsed on each ed treasure tas a result bout to munts
ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo
73|P a g e
mationsandonbyvertexkthebonesfcharacters
merousdrawnd33)theursystemby
fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and
olled by the characterrsquos to the drop-they scatter
nch on some
ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
74|P a g e
Figure 8 Animation texture layout
Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss
float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering
Bone 1
Matrix Row 0
Matrix Row 1
Matrix Row 2
Bone 0
Matrix Row 0
Matrix Row 1
Matrix Row 2
Key 0 Key1 Key N
RGBA FP16
Chapter 3 March of the Froblins
75|P a g e
void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)
Listing 6 Shader code to fetch interpolate and blend bone animations
AN
7
3
F(st
Rsat(stfs
T
AdvancesinReNTatarchuk(E
76|P a g e
35 Tess
Figure 9 Ou(left) On theshaders and the tessellate
Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder
FroblLowr
Zbrushreghighr
Table 1 Com
ealͲTimeRendeEditor)
sellation
ur system alle right the textures W
ed character
erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof
lincontrolcagesolutionmod
resolutionFro
mparison of
eringin3DGra
andCrow
lows rendersame chara
While using thr on the left
GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste
edel
blinmodel
memory foo
aphicsandGam
wdRende
ing characteacter is rendhe same memwhereas the
cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo
tprint for hig
mesCoursendashS
ering
ers with extrdered withomory footprie low resolut
asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke
Polygons
5160faces
15+Mfaces
gh and low r
IGGRAPH2008
reme details ut the use oint we are ation characte
60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka
resolution m
8
in close-up of tessellatioable to add er has much
RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf
Vertexa
2Ktimes2K16b
Vert
Ind
models for ou
when using on using idehigh level ofcoarser silh
D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption
TotalMemory
ndindexbuffe
itdisplacemen
texbuffer~27
dexBuffer180
ur main char
tessellationentical pixel f details for
houettes
0and4000fied shaderdhardwareirect3Dreg11universally
ssellationasseauthoredormanceA
y
ers100KB
ntmap10MB
70MB
0MB
racter
C
Fc
Chapter 3 Ma
Figure 10 Ccharacter
arch of the Fr
Comparison
oblins
of low resolution modeel (top) and hhigh resoluttion model (
7
(bottom) for
77|P a g e
the Froblin
AN
7
Hvtcodwotddc[
3
Ihho
F(vd
Gt1g
AdvancesinReNTatarchuk(E
78|P a g e
Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08
351 GPU
n this sectiohardwareashardware fixoutlinedinF
Figure 11 A(also referrevertices thusdisplacemen
Goingthrougtakesaninp15X times fgenerationo
CoarseS
ealͲTimeRendeEditor)
essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]
UTessellati
onwewillpsusedinouxedͲfunctionigure11
An overviewed to as the s amplifyingt obtaining
ghtheproceutprimitiveor Xboxreg 3ofgraphicsc
SuperͲPrimMesh
eringin3DGra
provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional
ionPipelin
provideanorsystemWn tessellator
w of the tesseldquocontrol cagg the input mthe final tess
essforasing(whichwe60 or ATI Rcards)Aver
Tessella
aphicsandGam
veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of
ne
overviewofWedesigned
unitavailab
ellation procgerdquo or ldquothe smesh The vesellated and
gleinputpolrefertoasaRadeonreg HDrtexshader
ator
mesCoursendashS
nefits speciis effective
ologyparamowamountorenderingooverallamodisplaceme
owsustoreddvideomemhe memoryrce Table 1f the benef
GPU tesselanAPIforableonrecen
cess We stasuper-primitertex shader
d displaced h
ygonwehaasuperͲprimD 2000Ͳ4000(whichwer
Tessellated Mesh
IGGRAPH2008
fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are
1demonstrafits provided
lationpipeliaGPUtesselntconsumer
art by rendertive meshrdquo)r is used to high resolutio
avethefollomitive)anda0 GPU generefertoasa
h
Di
8
al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell
ineavailablelationpipelirGPUsThe
ring a coars The tessellaevaluate suron mesh see
wingthehaamplifiesiterations ornevaluation
Evaluatesurfacepositions
Addsplacement
active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw
ducingtheoy relevant fy savings foation can b
eoncurreninetakingade tessellation
se low resoator unit genrface position on the righ
ardwaretess(upto411t64X for Di
nshader)is
TessellatedanMes
ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor
widthThisisverallgamefor consoleorourmainbe found in
tconsumerdvantageofnprocess is
lution mesh nerates new ons and add ht
sellatorunittrianglesorrect3Dreg 11invokedfor
dDisplacedsh
d
Chapter 3 March of the Froblins
79|P a g e
eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
80|P a g e
struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS
C
L GTHsc
Fdb Hnawddeae
Chapter 3 Ma
f f f f o o o o o r
Listing 7 Ex
GiventhatwTraditionallyHoweverthspacenormachangingthe
Figure 12 Ddisplaced in be shaded us
Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese
arch of the Fr
Convert po translatin somewhat f
float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor
ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS
return o
xample simp
wearerendey animatedereexistsaalmapsInteactualdisp
Displacementhe directio
sing normal
e intend toordertocommely thatwo generatentandnormantmapmustthe tangen
MDGPUMeserequireme
oblins
osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota
S = mul( mVPS = float4( v = vNormalWS = vTangentW
S = vBinormal
le evaluation
eringourchcharactersconcernwithatcasewelacednorma
nt of the veon of geometԢ
light thedismbineTSNMwearecomthe displa
almapsalsotbe generantͲspace norhMappertontsaremet
tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir
float4( vPovPositionWS S WS lWS
n shader for
aracterwithare rendehregardstoeareessental(asshown
rtex modifietric normal
splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr
e from objectrotating abo
vPositionOS vNormalOS )vTangentOS )vBinormalOS
ositionWS 1 1 )
r rendering i
hdisplacemered and litousingdisplatiallygeneratninFigure12
es the norma displacem
faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour
t space to woout y we can
) + vInstanc ) )
) )
nstanced tes
entmapweusing tang
acementmatinganewt2)
al used for ment amount
angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor
orld space byn simplify th
cePosWS
ssellated cha
emustmakegentͲspaceappingwhentangentfram
rendering
t D The resu
cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa
8
y rotating anhe math
aracters
eanoteabonormal manlightingusimeasweare
P is the oriulting point
mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat
81|P a g e
nd
outlightingps (TSNM)ngtangentͲedisplacing
iginal point Prsquo needs to
heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan
t
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
82|P a g e
353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows
ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ
HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive
353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed
Chapter 3 March of the Froblins
83|P a g e
vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
AN
6
3AEssmtftcTf
wadt
FasTtd
AdvancesinReNTatarchuk(E
62|P a g e
3223Ag
AgentrsquosdirecEachagentesolutionWesuitable nummotionfideltodeterminefitnessfunctto collision icircleisequa
The updatedfunctionresu
ݐ ൌ
whereݓisaagentsݐሺሻdirectionstotimeͲdeltasi
Figure 6 Eaand velocitieshown
Twoexampltwo sets aredirectiongi
ealͲTimeRendeEditor)
gentMove
ctionsneedevaluatesaneusedfivedmber of dirityversuspeethetimetionbasedoisdeterminealtothedisc
d velocity (Eult(Equation
ሻሺݏݏ ൌ
ൌ ఢ
ൌ పෝ
aperͲagentሻreturnstheoevaluatencethelast
ach agent eves of agents
esetsofdire identicalWehaveal
eringin3DGra
mentDirec
tochangeanumberoffdirectionsinections forerformancetocollisionwntheangleedbyevalucradiusroft
Equation 7)n6)andthe
ൌݓݐሺሻ
ሺݏݏݐ ሺݏǡ పෝሺݐݏ
factoraffecteminimumtistheglobsimulationf
valuates a fin its curre
rectionstobin the locasochosent
aphicsandGam
ctionDeterm
asaresultoixeddirectioourapplicaour applicaoverheadfowithagentsrelativetotating a swetheagents
is then casmallesttim
ሺ ȉ
ሻ
పሻ Τݐ ሻ
tingthepreftimetocollisbalnavigatioframe
fixed numberent and adja
beevaluatedl coordinatetoevaluate
mesCoursendashS
mination
ofpathfindinonsrelativeationMoreation was dorcomputininthecurrethedesiredgept circleͲcir
lculated basmetocollisio
ሻǤ ͷ Ǥͷ
ferencetomsionwithallondirection
r of potentia
acent bins T
dforvelocitye frame defonlydirectio
IGGRAPH2008
ngand localtothegoalcanbeuseddeterminedngmoredireentoradjacglobaldirectrcle collision
sed on theoninthatdir
moveinthegagentsindiisthespeݏ
al movemenTwo agents a
yupdatearfined by thonsthatwo
8
lavoidancedirectiondedforincreasempiricallyectionsEachcentbinsEationandthen test inwh
direction wrection
globaldirectirectioneedofagent
t directions and their co
eshownine agent andouldcausea
modelsrsquocometerminedbysedmotionfby evaluathdirectioniachdirectioetimetocolhich the rad
with the larg
(5)
(6)
(7)
tionoravoidisthesetotandݐ
based on thorresponding
Figure6Nod its globalldquorightturn
mputationsytheglobalfidelityTheing desiredsevaluatedn isgivenallisionTimediusofeach
gest fitness
dnearbyfdiscreteistheݐ
he positions g V sets are
otethatthe navigationrdquochange in
Chapter 3 March of the Froblins
63|P a g e
orientation This eliminates the need to arbitratewhich direction an agent should turn based on thevelocitiesofotheragentsandalsoresults in fewercollision tests Inpractice this limitation isnotverynoticeableordistractingIt is importanttonotethatweemployasimplekinodynamicconstrainttorestrictanagentrsquoschange invelocityItisdesiredtothrottlethechangeinorientationinagiventimestepbecauseouragentshaveaphysical limit tohow fast they can change theirorientations Thisprevents suddenbroad changes inorientationindenseagentsituations323AgentStateManagementAsagentnavigatearound theenvironment it is important tomaintain informationabout theircurrentstateThis includesdatasuchascurrentpositionandvelocitycurrentgroupIDcurrentanimationcycle(or action) and current time within that animation cycleWe alsomaintain perͲagent data such asmaximumspeedandgoalachievementdistanceGoalachievementdistanceisarandomvalueisusedtodeterminethedistancefromthegoalatwhichanagenthasreachedthatgoalThis isratherspecifictoourspecificgoal typessuchasmushroomandgoldpatcheswhere thegoalhasaspecificareaand theboundariesarenebulous AgentanimationcycletransitionsareperformedusingdynamicflowcontrolwithintheshadercontrollingagentupdatelogicIfthecurrenttimewithinananimationcycleisgreaterthanthelengthofthecurrentanimation several conditions are checked to determinewhat animation cycle should be part of theagentrsquosstateThesearecurrentanimationcycledistancefromnearestgoalnumberofagentsnearbydistancefromfearinducingobstaclessuchastheghostfroblinandnoxiousgascloudsandcurrentgroupWhile these agent updates will not be very coherent the agent data texture is very small and theperformance impact isslightDespitethedimensionalityof inputtotheanimationtransitionfunction itmaybepossible toprecompute theanimation transition logic into textures (asdone in [MHR07])andeliminateflowcontrol
324PathfindingResultsDiscussion
Ourapproachhasbeenusedtosimulate~65000agentsatinteractiveratesonanATIRadeonTMHD4870while performing intensive rendering tasks such as multiͲmillion triangle scene rendering globalillumination approximation atmospheric scattering and highͲquality cascaded shadow mapping Allresultswere collectedon anAMDPhenomtradeX4QuadͲCoreCPU systemwith2GBofRAM and anATIRadeonTM4870graphics cardwith512MBofGDDR5 videomemoryand standardengineandmemoryclocksThemainbottleneck inourapplication is renderingamassivenumberofagents In thecaseofsimulating65Kagentsweuseasimplifiedagentmodel(acylinder)Simulation(globalandlocal)alonefor65K agents on the above system is 45 fps Simulation of crowd behavior and interaction alongwithrendering for this largenumberof agents is 31 fpsNote that evenwithusing a very simple cylindermodelduetoextremelylargenumberofagentswearerendering98MtrianglesinthelattercaseAllofourtestingresultswerecollectedrenderedatHDresolutionwith4XMSAA
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
64|P a g e
WhilewechosetousethetwocrowdbehaviorsimulationtechniquesforourglobalandlocalnavigationtheyalsocanbeusedseparatelyastheyeachhavetheirownadvantagesanddisadvantagesThecontinuumapproachusedforourglobalsolverworkswell intheFroblinsscenariowheretherearelargenumbersofagentswithonlya fewtypesofgoalsMorecomplexscenarioswithdiversetasksarelikely to be incompatible with this type of approach Another disadvantage of using a continuumapproach is that all agents aremodeled as having global knowledge An agentwillmake navigationchoicesbasedonobstacles thatarenotvisible to itwhichmayalsonotbedesiredWe feel that thecontinuum approach would be excellent for ambient crowds That is large groups of nonͲplayercharacters which are present mostly for scenery but are expected to navigate around a dynamicenvironmentormovingcharactersThelocalmodelalsohasdisadvantagesBylimitinglocalavoidancevelocitiestobeofaclockwisenaturethevariation inagent interaction issomewhat limitedUsingasmalldiscretesetof localdirections fornavigationcanalso lead tooscillationbetween twodirectionscreatingdistractingbehaviorWhile thelocal avoidancemodelwill prevent collisions in very densely packed situations scenarios arisewhereagentscandeadlockandwillbecomestuckThistypicallyhappensatsinksinagentnavigationssuchasatasmallgoalOnceagentsbecomedenselypackedaroundagoalagentsthatreachthegoalwillbeunableto navigate out of the goal area This could be solved by incorporating varying levels of aggressivebehavior into agentmovement that causes agents to push each other out of theway This type ofapproach couldbeaugmentedbya compositeagentapproach [YCP08] to ldquotrailͲblazerdquopaths throughdenselypackedagents
33CharacterLODManagementTheoverarchinggoalofoursystemissimulationandrenderingofmassivecrowdsofcharacterswithhighlevelofdetailThe latestgenerationsofcommodityGPUsdemonstrate incredible increases ingeometryperformanceespeciallywiththe inclusionofGPUtessellationpipeline(Section35)Neverthelessevenwith stateͲofͲtheͲartgraphicshardware renderingmultiple thousandsofcomplexcharacterswithhighpolygonalcountsatinteractiveratesisverytaxingRenderingthousandsofcharacterswithoveramillionofpolygonseachisneitherpracticalnorwiseasinmanycasesthesecharactersmaybeverysmallonthescreen and therefore performance is wasted on the details that go unnoticed For this reason it isessential to use culling and level of detail (LOD) techniques in order tomake this rendering problemtractableCullingandLODmanagementhavetraditionallybeenCPUͲcentrictaskstradingamodestamountofCPUoverheadforamuch largerreduction intheGPUworkload HoweveracommondifficultyariseswhenthepositionaldataisgeneratedbyaGPUͲbasedsimulationandthereforewouldrequireacostlyreadͲbackoperationforCPUͲsidescenemanagementThis isexactlythesituationwersquoveencounteredaswesimulatedandanimatedourcharactersentirelyontheGPUThealternativeistousetheavailablecomputetoperformallcullingandscenemanagementdirectlyontheGPUInourFroblinsdemowesolvethisproblembyemployingDirect3Dreg10geometryshadersina
Chapter 3 March of the Froblins
65|P a g e
novelwaytoperformcharactercullingandLODsortingentirelyontheGPUThisenablesustoperformthese tasksefficiently forGPUͲsimulatedcharactersTheunderlying ideascouldalsobeappliedwithaCPUsimulationinordertooffloadthescenemanagementfromtheCPU
331UsingStreamͲOutOperationsasFilteringOursystem takesadvantageof instancingsupportavailablewithDirect3Dreg10andDirect3Dreg101APIWerenderanarmyofcharactersasvaried instancedcharacterswith individualactionsandanimationscontrolledontheGPUThisnaturallyleadstothekeyideabehindourscenemanagementapproachtheuseofgeometryshadersthatactas filters forasetofcharacter instances A filteringshaderworksbytaking a set of point primitives as inputwhere each point contains the perͲinstance data needed torenderagivencharacter(positionorientationandanimationstate) ThefilteringshaderreͲemitsonlythosepointswhichpassaparticulartestwhilediscardingtherestTheemittedpointsarestreamedintoa bufferwhich can then be reͲbound as instance data and used to render the characters MultiplefilteringpassescanbechainedtogetherbyusingsuccessiveDrawAutocallsanddifferenttestscanbesetupsimplybyusingdifferentshadersInpracticeweuseasharedgeometryshadertoperformtheactualfilteringandperformthedifferentfilteringtestsinvertexshadersAsidefromprovidingmoremodularcodethisapproachcanalsoprovideperformancebenefitsThesourcetothisfilteringgeometryshaderisshowninListing1
struct GSInput XYZ contain the characterrsquos origin in world space W contains a group number which is used to vary character appearance float4 vPositionAndGroup PositionAndGroup XY contain the character orientation (a vector in the XZ plane) Z contains the index of the characterrsquos animation cycle W contains the time along the cycle (see section 35) float4 vDirection DirectionStateAndTime Result of predicate test 1 == emit 0 == do not emit float fResult TestResult struct GSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime [maxvertexcount(1)] void main( point GSInput vert[1] inout PointStreamltGSOutputgt outputStream ) [branch] if( vert[0]fResult == 1 ) GSOutput o ovPositionAndGroup = vert[0]vPositionAndGroup ovDirection = vert[0]vDirection outputStreamAppend( o )
Listing 1 A stream-filtering geometry shader
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
66|P a g e
332ViewͲFrustumCullingIt is straightforward to perform view frustum culling using a filtering geometry shader as describedabove ForviewͲfrustumcullingthevertexshadersimplyperformsan intersectioncheckbetweenthecharacterboundingvolumeandtheviewfrustumusingtheusualalgorithms(forexample[AHH08])Ifthe testpasses then the corresponding character is visible and its instancedata isemitted from thegeometryshaderandstreamedoutOtherwiseitisdiscardedAnexampleofacullingvertexshaderisgiveninListing2
struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirectionStateAndTime DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible TestResult Computes signed distance between a point and a plane vPlane Contains plane coefficients (abcd) where ax + by + cz = d vPoint Point to be tested against the plane float DistanceToPlane( float4 vPlane float3 vPoint ) return dot( float4( vPoint -1 ) vPlane ) Frustum cullling on a sphere Returns 1 if visible 0 otherwise float CullSphere( float4 vPlanes[6] float3 vCenter float fRadius ) for( uint i=0 ilt6 i++ ) entire sphere is outside one of the six planes cull immediately if( DistanceToPlane( vPlanes[i] vCenter ) gt fRadius ) return 0 return 1 float4 vFrustumPlanes[6] view-frustum planes in world space (normals face out) float3 vSphereCenter bounding sphere center relative to character origin float fSphereRadius bounding sphere radius VSOutput VS( VSInput i ) compute bounding sphere center in world space float3 vObjectPosWS = ivPositionAndGroupxyz float3 vSphereCenterWS = vBoundingSphereCenterxyz + vObjectPosWS perform view-frustum test float fVisible = CullSphere( vFrustumPlanes vSphereCenterWS fSphereRadius ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionStateAndTime = ivDirectionStateAndTime ofVisible = fVisible return o
Listing 2 Vertex shader for view-frustum culling
Chapter 3 March of the Froblins
67|P a g e
333OcclusionCullingWe can also perform occlusion culling in this framework to avoid rendering characters which arecompletelyoccludedbymountainsorstructuresBecauseweareperformingourcharactermanagementontheGPUweareabletoperformocclusionculling inanovelwaybytakingadvantageofthedepthinformationthatexists inthehardwareZbuffer Thisapproachrequiresfar lessCPUoverheadthananapproachbasedonpredicatedrenderingorocclusionquerieswhilestillallowingcullingagainstarbitrarydynamicoccluders Ourapproach issimilar inspirittothehierarchicalZtestingthat is implemented inmodernGPUsandwasinspiredbytheworkof[GKM93]whousedahierarchicaldepthimagecombinedwithanoctreetoculloccludedgeometryinbulkAfter rendering allof theoccluders in the scenewe construct ahierarchicaldepth image from the Zbufferwhichwewill refer toasaHiͲZmap TheHiͲZmap isamipͲmappedscreenͲresolution imagewhereeachtexelinmiplevelicontainsthemaximumdepthofallcorrespondingtexelsinmipleveliͲ1InthemostdetailedmipleveleachtexelsimplycontainsthecorrespondingdepthvaluefromtheZbufferThis depth information can be collected during themain rendering pass for the occluding objects aseparatedepthpassisnotrequiredtobuildtheHiͲZmapAfter construction of the HiͲZ map occlusion culling can be performed by examining the depthinformation forpixelswhicharecoveredbyanobjectrsquosboundingsphereandcomparingthemaximumfetcheddepthtotheprojecteddepthofapointonthespherethat isnearesttothecamera Althoughthisapproachdoesnotprovideanexactocclusiontestitgivesaconservativeestimatethatworkswellinmanycasesandwillneverresultinfalseculling3331HiͲZMapConstructionForsingleͲsamplerenderingonecanusetheHiͲZmapasthemaindepthbufferforrenderingthescene(usingaDepthStencilviewofthefirstmiplevel)InDirect3Dreg101multiͲsampleddepthbufferscanalsobesupportedwithanextrafullͲscreenquadpassbyfirstcomputingthemaximumdepthofeachpixelrsquossubͲsamplesandstoringtheresultinthelowestleveloftheHiͲZmapSubsequentlevelsaregeneratedusingasequenceofreductionpasseswhichrepeatedlyfetchtexelsandcomputetheirmaximumasshown inListing3 BecausescreenͲsized imagestypicallydonotmipwellcaremust be taken when reducing oddͲsizedmip levels In this case the pixels on the oddͲsizedboundaryedgemustfetchadditionaltexelstoensurethattheirdepthvaluesaretakenintoaccountInaddition it isnecessarytouse integercalculations forthetextureaddressarithmeticbecause floatingͲpointerrorcanresultinincorrectaddressingwhenrenderingintothelowermiplevelsEachofthereductionpassesrenders intoonemip leveloftheHiͲZmapresourcewhilesamplingfromthepreviousoneThisisvalidapproachinDirect3Dreg10aslongastheresourceviewusedfortheinputmip level does not overlap the one being used for output (different input and output viewsmust becreatedforeachpass)
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
68|P a g e
struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM
Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control
Chapter 3 March of the Froblins
69|P a g e
betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula
ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)
HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ
ௗ ୲ୟ୬ቀഇమቁ (9)
whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat
ቀௐଶቁ ͳ (10)
wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
70|P a g e
ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD
Chapter 3 March of the Froblins
71|P a g e
float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o
Listing 4 Vertex shader for per-instance occlusion culling
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
72|P a g e
335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands
RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )
Listing 5 Summary of our character management system
C
3 TcsftctmOdttbaipo
FccoadTmaofobt
Chapter 3 Ma
34 Cha
The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea
Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation
Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu
The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon
arch of the Fr
racterAn
nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa
canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in
xample of difgic shader
e and desired(b) User pl
we see the cushrooms (d)
of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence
oblins
nimation
h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto
ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp
ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe
mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes
ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU
redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th
ons that our determines tom the left ious poison ng away fro
eaceful restin
is illustratewherethehober isusedtouse the teeanimationweight annotableperfo
med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour
ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor
Froblins cathe current a(a) Froblin cloud in the
om the hazarng restores t
ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai
dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes
kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res
an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo
e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto
s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr
miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi
These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri
ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic
7
ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou
c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p
ns are controsed on each ed treasure tas a result bout to munts
ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo
73|P a g e
mationsandonbyvertexkthebonesfcharacters
merousdrawnd33)theursystemby
fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and
olled by the characterrsquos to the drop-they scatter
nch on some
ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
74|P a g e
Figure 8 Animation texture layout
Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss
float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering
Bone 1
Matrix Row 0
Matrix Row 1
Matrix Row 2
Bone 0
Matrix Row 0
Matrix Row 1
Matrix Row 2
Key 0 Key1 Key N
RGBA FP16
Chapter 3 March of the Froblins
75|P a g e
void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)
Listing 6 Shader code to fetch interpolate and blend bone animations
AN
7
3
F(st
Rsat(stfs
T
AdvancesinReNTatarchuk(E
76|P a g e
35 Tess
Figure 9 Ou(left) On theshaders and the tessellate
Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder
FroblLowr
Zbrushreghighr
Table 1 Com
ealͲTimeRendeEditor)
sellation
ur system alle right the textures W
ed character
erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof
lincontrolcagesolutionmod
resolutionFro
mparison of
eringin3DGra
andCrow
lows rendersame chara
While using thr on the left
GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste
edel
blinmodel
memory foo
aphicsandGam
wdRende
ing characteacter is rendhe same memwhereas the
cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo
tprint for hig
mesCoursendashS
ering
ers with extrdered withomory footprie low resolut
asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke
Polygons
5160faces
15+Mfaces
gh and low r
IGGRAPH2008
reme details ut the use oint we are ation characte
60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka
resolution m
8
in close-up of tessellatioable to add er has much
RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf
Vertexa
2Ktimes2K16b
Vert
Ind
models for ou
when using on using idehigh level ofcoarser silh
D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption
TotalMemory
ndindexbuffe
itdisplacemen
texbuffer~27
dexBuffer180
ur main char
tessellationentical pixel f details for
houettes
0and4000fied shaderdhardwareirect3Dreg11universally
ssellationasseauthoredormanceA
y
ers100KB
ntmap10MB
70MB
0MB
racter
C
Fc
Chapter 3 Ma
Figure 10 Ccharacter
arch of the Fr
Comparison
oblins
of low resolution modeel (top) and hhigh resoluttion model (
7
(bottom) for
77|P a g e
the Froblin
AN
7
Hvtcodwotddc[
3
Ihho
F(vd
Gt1g
AdvancesinReNTatarchuk(E
78|P a g e
Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08
351 GPU
n this sectiohardwareashardware fixoutlinedinF
Figure 11 A(also referrevertices thusdisplacemen
Goingthrougtakesaninp15X times fgenerationo
CoarseS
ealͲTimeRendeEditor)
essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]
UTessellati
onwewillpsusedinouxedͲfunctionigure11
An overviewed to as the s amplifyingt obtaining
ghtheproceutprimitiveor Xboxreg 3ofgraphicsc
SuperͲPrimMesh
eringin3DGra
provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional
ionPipelin
provideanorsystemWn tessellator
w of the tesseldquocontrol cagg the input mthe final tess
essforasing(whichwe60 or ATI Rcards)Aver
Tessella
aphicsandGam
veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of
ne
overviewofWedesigned
unitavailab
ellation procgerdquo or ldquothe smesh The vesellated and
gleinputpolrefertoasaRadeonreg HDrtexshader
ator
mesCoursendashS
nefits speciis effective
ologyparamowamountorenderingooverallamodisplaceme
owsustoreddvideomemhe memoryrce Table 1f the benef
GPU tesselanAPIforableonrecen
cess We stasuper-primitertex shader
d displaced h
ygonwehaasuperͲprimD 2000Ͳ4000(whichwer
Tessellated Mesh
IGGRAPH2008
fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are
1demonstrafits provided
lationpipeliaGPUtesselntconsumer
art by rendertive meshrdquo)r is used to high resolutio
avethefollomitive)anda0 GPU generefertoasa
h
Di
8
al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell
ineavailablelationpipelirGPUsThe
ring a coars The tessellaevaluate suron mesh see
wingthehaamplifiesiterations ornevaluation
Evaluatesurfacepositions
Addsplacement
active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw
ducingtheoy relevant fy savings foation can b
eoncurreninetakingade tessellation
se low resoator unit genrface position on the righ
ardwaretess(upto411t64X for Di
nshader)is
TessellatedanMes
ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor
widthThisisverallgamefor consoleorourmainbe found in
tconsumerdvantageofnprocess is
lution mesh nerates new ons and add ht
sellatorunittrianglesorrect3Dreg 11invokedfor
dDisplacedsh
d
Chapter 3 March of the Froblins
79|P a g e
eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
80|P a g e
struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS
C
L GTHsc
Fdb Hnawddeae
Chapter 3 Ma
f f f f o o o o o r
Listing 7 Ex
GiventhatwTraditionallyHoweverthspacenormachangingthe
Figure 12 Ddisplaced in be shaded us
Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese
arch of the Fr
Convert po translatin somewhat f
float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor
ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS
return o
xample simp
wearerendey animatedereexistsaalmapsInteactualdisp
Displacementhe directio
sing normal
e intend toordertocommely thatwo generatentandnormantmapmustthe tangen
MDGPUMeserequireme
oblins
osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota
S = mul( mVPS = float4( v = vNormalWS = vTangentW
S = vBinormal
le evaluation
eringourchcharactersconcernwithatcasewelacednorma
nt of the veon of geometԢ
light thedismbineTSNMwearecomthe displa
almapsalsotbe generantͲspace norhMappertontsaremet
tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir
float4( vPovPositionWS S WS lWS
n shader for
aracterwithare rendehregardstoeareessental(asshown
rtex modifietric normal
splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr
e from objectrotating abo
vPositionOS vNormalOS )vTangentOS )vBinormalOS
ositionWS 1 1 )
r rendering i
hdisplacemered and litousingdisplatiallygeneratninFigure12
es the norma displacem
faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour
t space to woout y we can
) + vInstanc ) )
) )
nstanced tes
entmapweusing tang
acementmatinganewt2)
al used for ment amount
angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor
orld space byn simplify th
cePosWS
ssellated cha
emustmakegentͲspaceappingwhentangentfram
rendering
t D The resu
cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa
8
y rotating anhe math
aracters
eanoteabonormal manlightingusimeasweare
P is the oriulting point
mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat
81|P a g e
nd
outlightingps (TSNM)ngtangentͲedisplacing
iginal point Prsquo needs to
heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan
t
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
82|P a g e
353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows
ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ
HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive
353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed
Chapter 3 March of the Froblins
83|P a g e
vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
63|P a g e
orientation This eliminates the need to arbitratewhich direction an agent should turn based on thevelocitiesofotheragentsandalsoresults in fewercollision tests Inpractice this limitation isnotverynoticeableordistractingIt is importanttonotethatweemployasimplekinodynamicconstrainttorestrictanagentrsquoschange invelocityItisdesiredtothrottlethechangeinorientationinagiventimestepbecauseouragentshaveaphysical limit tohow fast they can change theirorientations Thisprevents suddenbroad changes inorientationindenseagentsituations323AgentStateManagementAsagentnavigatearound theenvironment it is important tomaintain informationabout theircurrentstateThis includesdatasuchascurrentpositionandvelocitycurrentgroupIDcurrentanimationcycle(or action) and current time within that animation cycleWe alsomaintain perͲagent data such asmaximumspeedandgoalachievementdistanceGoalachievementdistanceisarandomvalueisusedtodeterminethedistancefromthegoalatwhichanagenthasreachedthatgoalThis isratherspecifictoourspecificgoal typessuchasmushroomandgoldpatcheswhere thegoalhasaspecificareaand theboundariesarenebulous AgentanimationcycletransitionsareperformedusingdynamicflowcontrolwithintheshadercontrollingagentupdatelogicIfthecurrenttimewithinananimationcycleisgreaterthanthelengthofthecurrentanimation several conditions are checked to determinewhat animation cycle should be part of theagentrsquosstateThesearecurrentanimationcycledistancefromnearestgoalnumberofagentsnearbydistancefromfearinducingobstaclessuchastheghostfroblinandnoxiousgascloudsandcurrentgroupWhile these agent updates will not be very coherent the agent data texture is very small and theperformance impact isslightDespitethedimensionalityof inputtotheanimationtransitionfunction itmaybepossible toprecompute theanimation transition logic into textures (asdone in [MHR07])andeliminateflowcontrol
324PathfindingResultsDiscussion
Ourapproachhasbeenusedtosimulate~65000agentsatinteractiveratesonanATIRadeonTMHD4870while performing intensive rendering tasks such as multiͲmillion triangle scene rendering globalillumination approximation atmospheric scattering and highͲquality cascaded shadow mapping Allresultswere collectedon anAMDPhenomtradeX4QuadͲCoreCPU systemwith2GBofRAM and anATIRadeonTM4870graphics cardwith512MBofGDDR5 videomemoryand standardengineandmemoryclocksThemainbottleneck inourapplication is renderingamassivenumberofagents In thecaseofsimulating65Kagentsweuseasimplifiedagentmodel(acylinder)Simulation(globalandlocal)alonefor65K agents on the above system is 45 fps Simulation of crowd behavior and interaction alongwithrendering for this largenumberof agents is 31 fpsNote that evenwithusing a very simple cylindermodelduetoextremelylargenumberofagentswearerendering98MtrianglesinthelattercaseAllofourtestingresultswerecollectedrenderedatHDresolutionwith4XMSAA
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
64|P a g e
WhilewechosetousethetwocrowdbehaviorsimulationtechniquesforourglobalandlocalnavigationtheyalsocanbeusedseparatelyastheyeachhavetheirownadvantagesanddisadvantagesThecontinuumapproachusedforourglobalsolverworkswell intheFroblinsscenariowheretherearelargenumbersofagentswithonlya fewtypesofgoalsMorecomplexscenarioswithdiversetasksarelikely to be incompatible with this type of approach Another disadvantage of using a continuumapproach is that all agents aremodeled as having global knowledge An agentwillmake navigationchoicesbasedonobstacles thatarenotvisible to itwhichmayalsonotbedesiredWe feel that thecontinuum approach would be excellent for ambient crowds That is large groups of nonͲplayercharacters which are present mostly for scenery but are expected to navigate around a dynamicenvironmentormovingcharactersThelocalmodelalsohasdisadvantagesBylimitinglocalavoidancevelocitiestobeofaclockwisenaturethevariation inagent interaction issomewhat limitedUsingasmalldiscretesetof localdirections fornavigationcanalso lead tooscillationbetween twodirectionscreatingdistractingbehaviorWhile thelocal avoidancemodelwill prevent collisions in very densely packed situations scenarios arisewhereagentscandeadlockandwillbecomestuckThistypicallyhappensatsinksinagentnavigationssuchasatasmallgoalOnceagentsbecomedenselypackedaroundagoalagentsthatreachthegoalwillbeunableto navigate out of the goal area This could be solved by incorporating varying levels of aggressivebehavior into agentmovement that causes agents to push each other out of theway This type ofapproach couldbeaugmentedbya compositeagentapproach [YCP08] to ldquotrailͲblazerdquopaths throughdenselypackedagents
33CharacterLODManagementTheoverarchinggoalofoursystemissimulationandrenderingofmassivecrowdsofcharacterswithhighlevelofdetailThe latestgenerationsofcommodityGPUsdemonstrate incredible increases ingeometryperformanceespeciallywiththe inclusionofGPUtessellationpipeline(Section35)Neverthelessevenwith stateͲofͲtheͲartgraphicshardware renderingmultiple thousandsofcomplexcharacterswithhighpolygonalcountsatinteractiveratesisverytaxingRenderingthousandsofcharacterswithoveramillionofpolygonseachisneitherpracticalnorwiseasinmanycasesthesecharactersmaybeverysmallonthescreen and therefore performance is wasted on the details that go unnoticed For this reason it isessential to use culling and level of detail (LOD) techniques in order tomake this rendering problemtractableCullingandLODmanagementhavetraditionallybeenCPUͲcentrictaskstradingamodestamountofCPUoverheadforamuch largerreduction intheGPUworkload HoweveracommondifficultyariseswhenthepositionaldataisgeneratedbyaGPUͲbasedsimulationandthereforewouldrequireacostlyreadͲbackoperationforCPUͲsidescenemanagementThis isexactlythesituationwersquoveencounteredaswesimulatedandanimatedourcharactersentirelyontheGPUThealternativeistousetheavailablecomputetoperformallcullingandscenemanagementdirectlyontheGPUInourFroblinsdemowesolvethisproblembyemployingDirect3Dreg10geometryshadersina
Chapter 3 March of the Froblins
65|P a g e
novelwaytoperformcharactercullingandLODsortingentirelyontheGPUThisenablesustoperformthese tasksefficiently forGPUͲsimulatedcharactersTheunderlying ideascouldalsobeappliedwithaCPUsimulationinordertooffloadthescenemanagementfromtheCPU
331UsingStreamͲOutOperationsasFilteringOursystem takesadvantageof instancingsupportavailablewithDirect3Dreg10andDirect3Dreg101APIWerenderanarmyofcharactersasvaried instancedcharacterswith individualactionsandanimationscontrolledontheGPUThisnaturallyleadstothekeyideabehindourscenemanagementapproachtheuseofgeometryshadersthatactas filters forasetofcharacter instances A filteringshaderworksbytaking a set of point primitives as inputwhere each point contains the perͲinstance data needed torenderagivencharacter(positionorientationandanimationstate) ThefilteringshaderreͲemitsonlythosepointswhichpassaparticulartestwhilediscardingtherestTheemittedpointsarestreamedintoa bufferwhich can then be reͲbound as instance data and used to render the characters MultiplefilteringpassescanbechainedtogetherbyusingsuccessiveDrawAutocallsanddifferenttestscanbesetupsimplybyusingdifferentshadersInpracticeweuseasharedgeometryshadertoperformtheactualfilteringandperformthedifferentfilteringtestsinvertexshadersAsidefromprovidingmoremodularcodethisapproachcanalsoprovideperformancebenefitsThesourcetothisfilteringgeometryshaderisshowninListing1
struct GSInput XYZ contain the characterrsquos origin in world space W contains a group number which is used to vary character appearance float4 vPositionAndGroup PositionAndGroup XY contain the character orientation (a vector in the XZ plane) Z contains the index of the characterrsquos animation cycle W contains the time along the cycle (see section 35) float4 vDirection DirectionStateAndTime Result of predicate test 1 == emit 0 == do not emit float fResult TestResult struct GSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime [maxvertexcount(1)] void main( point GSInput vert[1] inout PointStreamltGSOutputgt outputStream ) [branch] if( vert[0]fResult == 1 ) GSOutput o ovPositionAndGroup = vert[0]vPositionAndGroup ovDirection = vert[0]vDirection outputStreamAppend( o )
Listing 1 A stream-filtering geometry shader
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
66|P a g e
332ViewͲFrustumCullingIt is straightforward to perform view frustum culling using a filtering geometry shader as describedabove ForviewͲfrustumcullingthevertexshadersimplyperformsan intersectioncheckbetweenthecharacterboundingvolumeandtheviewfrustumusingtheusualalgorithms(forexample[AHH08])Ifthe testpasses then the corresponding character is visible and its instancedata isemitted from thegeometryshaderandstreamedoutOtherwiseitisdiscardedAnexampleofacullingvertexshaderisgiveninListing2
struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirectionStateAndTime DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible TestResult Computes signed distance between a point and a plane vPlane Contains plane coefficients (abcd) where ax + by + cz = d vPoint Point to be tested against the plane float DistanceToPlane( float4 vPlane float3 vPoint ) return dot( float4( vPoint -1 ) vPlane ) Frustum cullling on a sphere Returns 1 if visible 0 otherwise float CullSphere( float4 vPlanes[6] float3 vCenter float fRadius ) for( uint i=0 ilt6 i++ ) entire sphere is outside one of the six planes cull immediately if( DistanceToPlane( vPlanes[i] vCenter ) gt fRadius ) return 0 return 1 float4 vFrustumPlanes[6] view-frustum planes in world space (normals face out) float3 vSphereCenter bounding sphere center relative to character origin float fSphereRadius bounding sphere radius VSOutput VS( VSInput i ) compute bounding sphere center in world space float3 vObjectPosWS = ivPositionAndGroupxyz float3 vSphereCenterWS = vBoundingSphereCenterxyz + vObjectPosWS perform view-frustum test float fVisible = CullSphere( vFrustumPlanes vSphereCenterWS fSphereRadius ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionStateAndTime = ivDirectionStateAndTime ofVisible = fVisible return o
Listing 2 Vertex shader for view-frustum culling
Chapter 3 March of the Froblins
67|P a g e
333OcclusionCullingWe can also perform occlusion culling in this framework to avoid rendering characters which arecompletelyoccludedbymountainsorstructuresBecauseweareperformingourcharactermanagementontheGPUweareabletoperformocclusionculling inanovelwaybytakingadvantageofthedepthinformationthatexists inthehardwareZbuffer Thisapproachrequiresfar lessCPUoverheadthananapproachbasedonpredicatedrenderingorocclusionquerieswhilestillallowingcullingagainstarbitrarydynamicoccluders Ourapproach issimilar inspirittothehierarchicalZtestingthat is implemented inmodernGPUsandwasinspiredbytheworkof[GKM93]whousedahierarchicaldepthimagecombinedwithanoctreetoculloccludedgeometryinbulkAfter rendering allof theoccluders in the scenewe construct ahierarchicaldepth image from the Zbufferwhichwewill refer toasaHiͲZmap TheHiͲZmap isamipͲmappedscreenͲresolution imagewhereeachtexelinmiplevelicontainsthemaximumdepthofallcorrespondingtexelsinmipleveliͲ1InthemostdetailedmipleveleachtexelsimplycontainsthecorrespondingdepthvaluefromtheZbufferThis depth information can be collected during themain rendering pass for the occluding objects aseparatedepthpassisnotrequiredtobuildtheHiͲZmapAfter construction of the HiͲZ map occlusion culling can be performed by examining the depthinformation forpixelswhicharecoveredbyanobjectrsquosboundingsphereandcomparingthemaximumfetcheddepthtotheprojecteddepthofapointonthespherethat isnearesttothecamera Althoughthisapproachdoesnotprovideanexactocclusiontestitgivesaconservativeestimatethatworkswellinmanycasesandwillneverresultinfalseculling3331HiͲZMapConstructionForsingleͲsamplerenderingonecanusetheHiͲZmapasthemaindepthbufferforrenderingthescene(usingaDepthStencilviewofthefirstmiplevel)InDirect3Dreg101multiͲsampleddepthbufferscanalsobesupportedwithanextrafullͲscreenquadpassbyfirstcomputingthemaximumdepthofeachpixelrsquossubͲsamplesandstoringtheresultinthelowestleveloftheHiͲZmapSubsequentlevelsaregeneratedusingasequenceofreductionpasseswhichrepeatedlyfetchtexelsandcomputetheirmaximumasshown inListing3 BecausescreenͲsized imagestypicallydonotmipwellcaremust be taken when reducing oddͲsizedmip levels In this case the pixels on the oddͲsizedboundaryedgemustfetchadditionaltexelstoensurethattheirdepthvaluesaretakenintoaccountInaddition it isnecessarytouse integercalculations forthetextureaddressarithmeticbecause floatingͲpointerrorcanresultinincorrectaddressingwhenrenderingintothelowermiplevelsEachofthereductionpassesrenders intoonemip leveloftheHiͲZmapresourcewhilesamplingfromthepreviousoneThisisvalidapproachinDirect3Dreg10aslongastheresourceviewusedfortheinputmip level does not overlap the one being used for output (different input and output viewsmust becreatedforeachpass)
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
68|P a g e
struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM
Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control
Chapter 3 March of the Froblins
69|P a g e
betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula
ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)
HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ
ௗ ୲ୟ୬ቀഇమቁ (9)
whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat
ቀௐଶቁ ͳ (10)
wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
70|P a g e
ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD
Chapter 3 March of the Froblins
71|P a g e
float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o
Listing 4 Vertex shader for per-instance occlusion culling
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
72|P a g e
335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands
RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )
Listing 5 Summary of our character management system
C
3 TcsftctmOdttbaipo
FccoadTmaofobt
Chapter 3 Ma
34 Cha
The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea
Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation
Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu
The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon
arch of the Fr
racterAn
nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa
canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in
xample of difgic shader
e and desired(b) User pl
we see the cushrooms (d)
of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence
oblins
nimation
h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto
ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp
ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe
mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes
ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU
redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th
ons that our determines tom the left ious poison ng away fro
eaceful restin
is illustratewherethehober isusedtouse the teeanimationweight annotableperfo
med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour
ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor
Froblins cathe current a(a) Froblin cloud in the
om the hazarng restores t
ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai
dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes
kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res
an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo
e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto
s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr
miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi
These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri
ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic
7
ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou
c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p
ns are controsed on each ed treasure tas a result bout to munts
ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo
73|P a g e
mationsandonbyvertexkthebonesfcharacters
merousdrawnd33)theursystemby
fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and
olled by the characterrsquos to the drop-they scatter
nch on some
ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
74|P a g e
Figure 8 Animation texture layout
Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss
float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering
Bone 1
Matrix Row 0
Matrix Row 1
Matrix Row 2
Bone 0
Matrix Row 0
Matrix Row 1
Matrix Row 2
Key 0 Key1 Key N
RGBA FP16
Chapter 3 March of the Froblins
75|P a g e
void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)
Listing 6 Shader code to fetch interpolate and blend bone animations
AN
7
3
F(st
Rsat(stfs
T
AdvancesinReNTatarchuk(E
76|P a g e
35 Tess
Figure 9 Ou(left) On theshaders and the tessellate
Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder
FroblLowr
Zbrushreghighr
Table 1 Com
ealͲTimeRendeEditor)
sellation
ur system alle right the textures W
ed character
erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof
lincontrolcagesolutionmod
resolutionFro
mparison of
eringin3DGra
andCrow
lows rendersame chara
While using thr on the left
GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste
edel
blinmodel
memory foo
aphicsandGam
wdRende
ing characteacter is rendhe same memwhereas the
cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo
tprint for hig
mesCoursendashS
ering
ers with extrdered withomory footprie low resolut
asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke
Polygons
5160faces
15+Mfaces
gh and low r
IGGRAPH2008
reme details ut the use oint we are ation characte
60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka
resolution m
8
in close-up of tessellatioable to add er has much
RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf
Vertexa
2Ktimes2K16b
Vert
Ind
models for ou
when using on using idehigh level ofcoarser silh
D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption
TotalMemory
ndindexbuffe
itdisplacemen
texbuffer~27
dexBuffer180
ur main char
tessellationentical pixel f details for
houettes
0and4000fied shaderdhardwareirect3Dreg11universally
ssellationasseauthoredormanceA
y
ers100KB
ntmap10MB
70MB
0MB
racter
C
Fc
Chapter 3 Ma
Figure 10 Ccharacter
arch of the Fr
Comparison
oblins
of low resolution modeel (top) and hhigh resoluttion model (
7
(bottom) for
77|P a g e
the Froblin
AN
7
Hvtcodwotddc[
3
Ihho
F(vd
Gt1g
AdvancesinReNTatarchuk(E
78|P a g e
Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08
351 GPU
n this sectiohardwareashardware fixoutlinedinF
Figure 11 A(also referrevertices thusdisplacemen
Goingthrougtakesaninp15X times fgenerationo
CoarseS
ealͲTimeRendeEditor)
essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]
UTessellati
onwewillpsusedinouxedͲfunctionigure11
An overviewed to as the s amplifyingt obtaining
ghtheproceutprimitiveor Xboxreg 3ofgraphicsc
SuperͲPrimMesh
eringin3DGra
provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional
ionPipelin
provideanorsystemWn tessellator
w of the tesseldquocontrol cagg the input mthe final tess
essforasing(whichwe60 or ATI Rcards)Aver
Tessella
aphicsandGam
veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of
ne
overviewofWedesigned
unitavailab
ellation procgerdquo or ldquothe smesh The vesellated and
gleinputpolrefertoasaRadeonreg HDrtexshader
ator
mesCoursendashS
nefits speciis effective
ologyparamowamountorenderingooverallamodisplaceme
owsustoreddvideomemhe memoryrce Table 1f the benef
GPU tesselanAPIforableonrecen
cess We stasuper-primitertex shader
d displaced h
ygonwehaasuperͲprimD 2000Ͳ4000(whichwer
Tessellated Mesh
IGGRAPH2008
fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are
1demonstrafits provided
lationpipeliaGPUtesselntconsumer
art by rendertive meshrdquo)r is used to high resolutio
avethefollomitive)anda0 GPU generefertoasa
h
Di
8
al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell
ineavailablelationpipelirGPUsThe
ring a coars The tessellaevaluate suron mesh see
wingthehaamplifiesiterations ornevaluation
Evaluatesurfacepositions
Addsplacement
active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw
ducingtheoy relevant fy savings foation can b
eoncurreninetakingade tessellation
se low resoator unit genrface position on the righ
ardwaretess(upto411t64X for Di
nshader)is
TessellatedanMes
ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor
widthThisisverallgamefor consoleorourmainbe found in
tconsumerdvantageofnprocess is
lution mesh nerates new ons and add ht
sellatorunittrianglesorrect3Dreg 11invokedfor
dDisplacedsh
d
Chapter 3 March of the Froblins
79|P a g e
eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
80|P a g e
struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS
C
L GTHsc
Fdb Hnawddeae
Chapter 3 Ma
f f f f o o o o o r
Listing 7 Ex
GiventhatwTraditionallyHoweverthspacenormachangingthe
Figure 12 Ddisplaced in be shaded us
Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese
arch of the Fr
Convert po translatin somewhat f
float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor
ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS
return o
xample simp
wearerendey animatedereexistsaalmapsInteactualdisp
Displacementhe directio
sing normal
e intend toordertocommely thatwo generatentandnormantmapmustthe tangen
MDGPUMeserequireme
oblins
osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota
S = mul( mVPS = float4( v = vNormalWS = vTangentW
S = vBinormal
le evaluation
eringourchcharactersconcernwithatcasewelacednorma
nt of the veon of geometԢ
light thedismbineTSNMwearecomthe displa
almapsalsotbe generantͲspace norhMappertontsaremet
tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir
float4( vPovPositionWS S WS lWS
n shader for
aracterwithare rendehregardstoeareessental(asshown
rtex modifietric normal
splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr
e from objectrotating abo
vPositionOS vNormalOS )vTangentOS )vBinormalOS
ositionWS 1 1 )
r rendering i
hdisplacemered and litousingdisplatiallygeneratninFigure12
es the norma displacem
faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour
t space to woout y we can
) + vInstanc ) )
) )
nstanced tes
entmapweusing tang
acementmatinganewt2)
al used for ment amount
angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor
orld space byn simplify th
cePosWS
ssellated cha
emustmakegentͲspaceappingwhentangentfram
rendering
t D The resu
cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa
8
y rotating anhe math
aracters
eanoteabonormal manlightingusimeasweare
P is the oriulting point
mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat
81|P a g e
nd
outlightingps (TSNM)ngtangentͲedisplacing
iginal point Prsquo needs to
heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan
t
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
82|P a g e
353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows
ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ
HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive
353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed
Chapter 3 March of the Froblins
83|P a g e
vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
64|P a g e
WhilewechosetousethetwocrowdbehaviorsimulationtechniquesforourglobalandlocalnavigationtheyalsocanbeusedseparatelyastheyeachhavetheirownadvantagesanddisadvantagesThecontinuumapproachusedforourglobalsolverworkswell intheFroblinsscenariowheretherearelargenumbersofagentswithonlya fewtypesofgoalsMorecomplexscenarioswithdiversetasksarelikely to be incompatible with this type of approach Another disadvantage of using a continuumapproach is that all agents aremodeled as having global knowledge An agentwillmake navigationchoicesbasedonobstacles thatarenotvisible to itwhichmayalsonotbedesiredWe feel that thecontinuum approach would be excellent for ambient crowds That is large groups of nonͲplayercharacters which are present mostly for scenery but are expected to navigate around a dynamicenvironmentormovingcharactersThelocalmodelalsohasdisadvantagesBylimitinglocalavoidancevelocitiestobeofaclockwisenaturethevariation inagent interaction issomewhat limitedUsingasmalldiscretesetof localdirections fornavigationcanalso lead tooscillationbetween twodirectionscreatingdistractingbehaviorWhile thelocal avoidancemodelwill prevent collisions in very densely packed situations scenarios arisewhereagentscandeadlockandwillbecomestuckThistypicallyhappensatsinksinagentnavigationssuchasatasmallgoalOnceagentsbecomedenselypackedaroundagoalagentsthatreachthegoalwillbeunableto navigate out of the goal area This could be solved by incorporating varying levels of aggressivebehavior into agentmovement that causes agents to push each other out of theway This type ofapproach couldbeaugmentedbya compositeagentapproach [YCP08] to ldquotrailͲblazerdquopaths throughdenselypackedagents
33CharacterLODManagementTheoverarchinggoalofoursystemissimulationandrenderingofmassivecrowdsofcharacterswithhighlevelofdetailThe latestgenerationsofcommodityGPUsdemonstrate incredible increases ingeometryperformanceespeciallywiththe inclusionofGPUtessellationpipeline(Section35)Neverthelessevenwith stateͲofͲtheͲartgraphicshardware renderingmultiple thousandsofcomplexcharacterswithhighpolygonalcountsatinteractiveratesisverytaxingRenderingthousandsofcharacterswithoveramillionofpolygonseachisneitherpracticalnorwiseasinmanycasesthesecharactersmaybeverysmallonthescreen and therefore performance is wasted on the details that go unnoticed For this reason it isessential to use culling and level of detail (LOD) techniques in order tomake this rendering problemtractableCullingandLODmanagementhavetraditionallybeenCPUͲcentrictaskstradingamodestamountofCPUoverheadforamuch largerreduction intheGPUworkload HoweveracommondifficultyariseswhenthepositionaldataisgeneratedbyaGPUͲbasedsimulationandthereforewouldrequireacostlyreadͲbackoperationforCPUͲsidescenemanagementThis isexactlythesituationwersquoveencounteredaswesimulatedandanimatedourcharactersentirelyontheGPUThealternativeistousetheavailablecomputetoperformallcullingandscenemanagementdirectlyontheGPUInourFroblinsdemowesolvethisproblembyemployingDirect3Dreg10geometryshadersina
Chapter 3 March of the Froblins
65|P a g e
novelwaytoperformcharactercullingandLODsortingentirelyontheGPUThisenablesustoperformthese tasksefficiently forGPUͲsimulatedcharactersTheunderlying ideascouldalsobeappliedwithaCPUsimulationinordertooffloadthescenemanagementfromtheCPU
331UsingStreamͲOutOperationsasFilteringOursystem takesadvantageof instancingsupportavailablewithDirect3Dreg10andDirect3Dreg101APIWerenderanarmyofcharactersasvaried instancedcharacterswith individualactionsandanimationscontrolledontheGPUThisnaturallyleadstothekeyideabehindourscenemanagementapproachtheuseofgeometryshadersthatactas filters forasetofcharacter instances A filteringshaderworksbytaking a set of point primitives as inputwhere each point contains the perͲinstance data needed torenderagivencharacter(positionorientationandanimationstate) ThefilteringshaderreͲemitsonlythosepointswhichpassaparticulartestwhilediscardingtherestTheemittedpointsarestreamedintoa bufferwhich can then be reͲbound as instance data and used to render the characters MultiplefilteringpassescanbechainedtogetherbyusingsuccessiveDrawAutocallsanddifferenttestscanbesetupsimplybyusingdifferentshadersInpracticeweuseasharedgeometryshadertoperformtheactualfilteringandperformthedifferentfilteringtestsinvertexshadersAsidefromprovidingmoremodularcodethisapproachcanalsoprovideperformancebenefitsThesourcetothisfilteringgeometryshaderisshowninListing1
struct GSInput XYZ contain the characterrsquos origin in world space W contains a group number which is used to vary character appearance float4 vPositionAndGroup PositionAndGroup XY contain the character orientation (a vector in the XZ plane) Z contains the index of the characterrsquos animation cycle W contains the time along the cycle (see section 35) float4 vDirection DirectionStateAndTime Result of predicate test 1 == emit 0 == do not emit float fResult TestResult struct GSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime [maxvertexcount(1)] void main( point GSInput vert[1] inout PointStreamltGSOutputgt outputStream ) [branch] if( vert[0]fResult == 1 ) GSOutput o ovPositionAndGroup = vert[0]vPositionAndGroup ovDirection = vert[0]vDirection outputStreamAppend( o )
Listing 1 A stream-filtering geometry shader
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
66|P a g e
332ViewͲFrustumCullingIt is straightforward to perform view frustum culling using a filtering geometry shader as describedabove ForviewͲfrustumcullingthevertexshadersimplyperformsan intersectioncheckbetweenthecharacterboundingvolumeandtheviewfrustumusingtheusualalgorithms(forexample[AHH08])Ifthe testpasses then the corresponding character is visible and its instancedata isemitted from thegeometryshaderandstreamedoutOtherwiseitisdiscardedAnexampleofacullingvertexshaderisgiveninListing2
struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirectionStateAndTime DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible TestResult Computes signed distance between a point and a plane vPlane Contains plane coefficients (abcd) where ax + by + cz = d vPoint Point to be tested against the plane float DistanceToPlane( float4 vPlane float3 vPoint ) return dot( float4( vPoint -1 ) vPlane ) Frustum cullling on a sphere Returns 1 if visible 0 otherwise float CullSphere( float4 vPlanes[6] float3 vCenter float fRadius ) for( uint i=0 ilt6 i++ ) entire sphere is outside one of the six planes cull immediately if( DistanceToPlane( vPlanes[i] vCenter ) gt fRadius ) return 0 return 1 float4 vFrustumPlanes[6] view-frustum planes in world space (normals face out) float3 vSphereCenter bounding sphere center relative to character origin float fSphereRadius bounding sphere radius VSOutput VS( VSInput i ) compute bounding sphere center in world space float3 vObjectPosWS = ivPositionAndGroupxyz float3 vSphereCenterWS = vBoundingSphereCenterxyz + vObjectPosWS perform view-frustum test float fVisible = CullSphere( vFrustumPlanes vSphereCenterWS fSphereRadius ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionStateAndTime = ivDirectionStateAndTime ofVisible = fVisible return o
Listing 2 Vertex shader for view-frustum culling
Chapter 3 March of the Froblins
67|P a g e
333OcclusionCullingWe can also perform occlusion culling in this framework to avoid rendering characters which arecompletelyoccludedbymountainsorstructuresBecauseweareperformingourcharactermanagementontheGPUweareabletoperformocclusionculling inanovelwaybytakingadvantageofthedepthinformationthatexists inthehardwareZbuffer Thisapproachrequiresfar lessCPUoverheadthananapproachbasedonpredicatedrenderingorocclusionquerieswhilestillallowingcullingagainstarbitrarydynamicoccluders Ourapproach issimilar inspirittothehierarchicalZtestingthat is implemented inmodernGPUsandwasinspiredbytheworkof[GKM93]whousedahierarchicaldepthimagecombinedwithanoctreetoculloccludedgeometryinbulkAfter rendering allof theoccluders in the scenewe construct ahierarchicaldepth image from the Zbufferwhichwewill refer toasaHiͲZmap TheHiͲZmap isamipͲmappedscreenͲresolution imagewhereeachtexelinmiplevelicontainsthemaximumdepthofallcorrespondingtexelsinmipleveliͲ1InthemostdetailedmipleveleachtexelsimplycontainsthecorrespondingdepthvaluefromtheZbufferThis depth information can be collected during themain rendering pass for the occluding objects aseparatedepthpassisnotrequiredtobuildtheHiͲZmapAfter construction of the HiͲZ map occlusion culling can be performed by examining the depthinformation forpixelswhicharecoveredbyanobjectrsquosboundingsphereandcomparingthemaximumfetcheddepthtotheprojecteddepthofapointonthespherethat isnearesttothecamera Althoughthisapproachdoesnotprovideanexactocclusiontestitgivesaconservativeestimatethatworkswellinmanycasesandwillneverresultinfalseculling3331HiͲZMapConstructionForsingleͲsamplerenderingonecanusetheHiͲZmapasthemaindepthbufferforrenderingthescene(usingaDepthStencilviewofthefirstmiplevel)InDirect3Dreg101multiͲsampleddepthbufferscanalsobesupportedwithanextrafullͲscreenquadpassbyfirstcomputingthemaximumdepthofeachpixelrsquossubͲsamplesandstoringtheresultinthelowestleveloftheHiͲZmapSubsequentlevelsaregeneratedusingasequenceofreductionpasseswhichrepeatedlyfetchtexelsandcomputetheirmaximumasshown inListing3 BecausescreenͲsized imagestypicallydonotmipwellcaremust be taken when reducing oddͲsizedmip levels In this case the pixels on the oddͲsizedboundaryedgemustfetchadditionaltexelstoensurethattheirdepthvaluesaretakenintoaccountInaddition it isnecessarytouse integercalculations forthetextureaddressarithmeticbecause floatingͲpointerrorcanresultinincorrectaddressingwhenrenderingintothelowermiplevelsEachofthereductionpassesrenders intoonemip leveloftheHiͲZmapresourcewhilesamplingfromthepreviousoneThisisvalidapproachinDirect3Dreg10aslongastheresourceviewusedfortheinputmip level does not overlap the one being used for output (different input and output viewsmust becreatedforeachpass)
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
68|P a g e
struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM
Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control
Chapter 3 March of the Froblins
69|P a g e
betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula
ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)
HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ
ௗ ୲ୟ୬ቀഇమቁ (9)
whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat
ቀௐଶቁ ͳ (10)
wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
70|P a g e
ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD
Chapter 3 March of the Froblins
71|P a g e
float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o
Listing 4 Vertex shader for per-instance occlusion culling
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
72|P a g e
335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands
RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )
Listing 5 Summary of our character management system
C
3 TcsftctmOdttbaipo
FccoadTmaofobt
Chapter 3 Ma
34 Cha
The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea
Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation
Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu
The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon
arch of the Fr
racterAn
nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa
canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in
xample of difgic shader
e and desired(b) User pl
we see the cushrooms (d)
of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence
oblins
nimation
h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto
ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp
ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe
mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes
ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU
redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th
ons that our determines tom the left ious poison ng away fro
eaceful restin
is illustratewherethehober isusedtouse the teeanimationweight annotableperfo
med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour
ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor
Froblins cathe current a(a) Froblin cloud in the
om the hazarng restores t
ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai
dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes
kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res
an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo
e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto
s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr
miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi
These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri
ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic
7
ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou
c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p
ns are controsed on each ed treasure tas a result bout to munts
ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo
73|P a g e
mationsandonbyvertexkthebonesfcharacters
merousdrawnd33)theursystemby
fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and
olled by the characterrsquos to the drop-they scatter
nch on some
ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
74|P a g e
Figure 8 Animation texture layout
Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss
float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering
Bone 1
Matrix Row 0
Matrix Row 1
Matrix Row 2
Bone 0
Matrix Row 0
Matrix Row 1
Matrix Row 2
Key 0 Key1 Key N
RGBA FP16
Chapter 3 March of the Froblins
75|P a g e
void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)
Listing 6 Shader code to fetch interpolate and blend bone animations
AN
7
3
F(st
Rsat(stfs
T
AdvancesinReNTatarchuk(E
76|P a g e
35 Tess
Figure 9 Ou(left) On theshaders and the tessellate
Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder
FroblLowr
Zbrushreghighr
Table 1 Com
ealͲTimeRendeEditor)
sellation
ur system alle right the textures W
ed character
erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof
lincontrolcagesolutionmod
resolutionFro
mparison of
eringin3DGra
andCrow
lows rendersame chara
While using thr on the left
GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste
edel
blinmodel
memory foo
aphicsandGam
wdRende
ing characteacter is rendhe same memwhereas the
cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo
tprint for hig
mesCoursendashS
ering
ers with extrdered withomory footprie low resolut
asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke
Polygons
5160faces
15+Mfaces
gh and low r
IGGRAPH2008
reme details ut the use oint we are ation characte
60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka
resolution m
8
in close-up of tessellatioable to add er has much
RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf
Vertexa
2Ktimes2K16b
Vert
Ind
models for ou
when using on using idehigh level ofcoarser silh
D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption
TotalMemory
ndindexbuffe
itdisplacemen
texbuffer~27
dexBuffer180
ur main char
tessellationentical pixel f details for
houettes
0and4000fied shaderdhardwareirect3Dreg11universally
ssellationasseauthoredormanceA
y
ers100KB
ntmap10MB
70MB
0MB
racter
C
Fc
Chapter 3 Ma
Figure 10 Ccharacter
arch of the Fr
Comparison
oblins
of low resolution modeel (top) and hhigh resoluttion model (
7
(bottom) for
77|P a g e
the Froblin
AN
7
Hvtcodwotddc[
3
Ihho
F(vd
Gt1g
AdvancesinReNTatarchuk(E
78|P a g e
Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08
351 GPU
n this sectiohardwareashardware fixoutlinedinF
Figure 11 A(also referrevertices thusdisplacemen
Goingthrougtakesaninp15X times fgenerationo
CoarseS
ealͲTimeRendeEditor)
essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]
UTessellati
onwewillpsusedinouxedͲfunctionigure11
An overviewed to as the s amplifyingt obtaining
ghtheproceutprimitiveor Xboxreg 3ofgraphicsc
SuperͲPrimMesh
eringin3DGra
provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional
ionPipelin
provideanorsystemWn tessellator
w of the tesseldquocontrol cagg the input mthe final tess
essforasing(whichwe60 or ATI Rcards)Aver
Tessella
aphicsandGam
veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of
ne
overviewofWedesigned
unitavailab
ellation procgerdquo or ldquothe smesh The vesellated and
gleinputpolrefertoasaRadeonreg HDrtexshader
ator
mesCoursendashS
nefits speciis effective
ologyparamowamountorenderingooverallamodisplaceme
owsustoreddvideomemhe memoryrce Table 1f the benef
GPU tesselanAPIforableonrecen
cess We stasuper-primitertex shader
d displaced h
ygonwehaasuperͲprimD 2000Ͳ4000(whichwer
Tessellated Mesh
IGGRAPH2008
fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are
1demonstrafits provided
lationpipeliaGPUtesselntconsumer
art by rendertive meshrdquo)r is used to high resolutio
avethefollomitive)anda0 GPU generefertoasa
h
Di
8
al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell
ineavailablelationpipelirGPUsThe
ring a coars The tessellaevaluate suron mesh see
wingthehaamplifiesiterations ornevaluation
Evaluatesurfacepositions
Addsplacement
active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw
ducingtheoy relevant fy savings foation can b
eoncurreninetakingade tessellation
se low resoator unit genrface position on the righ
ardwaretess(upto411t64X for Di
nshader)is
TessellatedanMes
ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor
widthThisisverallgamefor consoleorourmainbe found in
tconsumerdvantageofnprocess is
lution mesh nerates new ons and add ht
sellatorunittrianglesorrect3Dreg 11invokedfor
dDisplacedsh
d
Chapter 3 March of the Froblins
79|P a g e
eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
80|P a g e
struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS
C
L GTHsc
Fdb Hnawddeae
Chapter 3 Ma
f f f f o o o o o r
Listing 7 Ex
GiventhatwTraditionallyHoweverthspacenormachangingthe
Figure 12 Ddisplaced in be shaded us
Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese
arch of the Fr
Convert po translatin somewhat f
float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor
ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS
return o
xample simp
wearerendey animatedereexistsaalmapsInteactualdisp
Displacementhe directio
sing normal
e intend toordertocommely thatwo generatentandnormantmapmustthe tangen
MDGPUMeserequireme
oblins
osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota
S = mul( mVPS = float4( v = vNormalWS = vTangentW
S = vBinormal
le evaluation
eringourchcharactersconcernwithatcasewelacednorma
nt of the veon of geometԢ
light thedismbineTSNMwearecomthe displa
almapsalsotbe generantͲspace norhMappertontsaremet
tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir
float4( vPovPositionWS S WS lWS
n shader for
aracterwithare rendehregardstoeareessental(asshown
rtex modifietric normal
splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr
e from objectrotating abo
vPositionOS vNormalOS )vTangentOS )vBinormalOS
ositionWS 1 1 )
r rendering i
hdisplacemered and litousingdisplatiallygeneratninFigure12
es the norma displacem
faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour
t space to woout y we can
) + vInstanc ) )
) )
nstanced tes
entmapweusing tang
acementmatinganewt2)
al used for ment amount
angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor
orld space byn simplify th
cePosWS
ssellated cha
emustmakegentͲspaceappingwhentangentfram
rendering
t D The resu
cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa
8
y rotating anhe math
aracters
eanoteabonormal manlightingusimeasweare
P is the oriulting point
mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat
81|P a g e
nd
outlightingps (TSNM)ngtangentͲedisplacing
iginal point Prsquo needs to
heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan
t
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
82|P a g e
353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows
ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ
HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive
353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed
Chapter 3 March of the Froblins
83|P a g e
vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
65|P a g e
novelwaytoperformcharactercullingandLODsortingentirelyontheGPUThisenablesustoperformthese tasksefficiently forGPUͲsimulatedcharactersTheunderlying ideascouldalsobeappliedwithaCPUsimulationinordertooffloadthescenemanagementfromtheCPU
331UsingStreamͲOutOperationsasFilteringOursystem takesadvantageof instancingsupportavailablewithDirect3Dreg10andDirect3Dreg101APIWerenderanarmyofcharactersasvaried instancedcharacterswith individualactionsandanimationscontrolledontheGPUThisnaturallyleadstothekeyideabehindourscenemanagementapproachtheuseofgeometryshadersthatactas filters forasetofcharacter instances A filteringshaderworksbytaking a set of point primitives as inputwhere each point contains the perͲinstance data needed torenderagivencharacter(positionorientationandanimationstate) ThefilteringshaderreͲemitsonlythosepointswhichpassaparticulartestwhilediscardingtherestTheemittedpointsarestreamedintoa bufferwhich can then be reͲbound as instance data and used to render the characters MultiplefilteringpassescanbechainedtogetherbyusingsuccessiveDrawAutocallsanddifferenttestscanbesetupsimplybyusingdifferentshadersInpracticeweuseasharedgeometryshadertoperformtheactualfilteringandperformthedifferentfilteringtestsinvertexshadersAsidefromprovidingmoremodularcodethisapproachcanalsoprovideperformancebenefitsThesourcetothisfilteringgeometryshaderisshowninListing1
struct GSInput XYZ contain the characterrsquos origin in world space W contains a group number which is used to vary character appearance float4 vPositionAndGroup PositionAndGroup XY contain the character orientation (a vector in the XZ plane) Z contains the index of the characterrsquos animation cycle W contains the time along the cycle (see section 35) float4 vDirection DirectionStateAndTime Result of predicate test 1 == emit 0 == do not emit float fResult TestResult struct GSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime [maxvertexcount(1)] void main( point GSInput vert[1] inout PointStreamltGSOutputgt outputStream ) [branch] if( vert[0]fResult == 1 ) GSOutput o ovPositionAndGroup = vert[0]vPositionAndGroup ovDirection = vert[0]vDirection outputStreamAppend( o )
Listing 1 A stream-filtering geometry shader
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
66|P a g e
332ViewͲFrustumCullingIt is straightforward to perform view frustum culling using a filtering geometry shader as describedabove ForviewͲfrustumcullingthevertexshadersimplyperformsan intersectioncheckbetweenthecharacterboundingvolumeandtheviewfrustumusingtheusualalgorithms(forexample[AHH08])Ifthe testpasses then the corresponding character is visible and its instancedata isemitted from thegeometryshaderandstreamedoutOtherwiseitisdiscardedAnexampleofacullingvertexshaderisgiveninListing2
struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirectionStateAndTime DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible TestResult Computes signed distance between a point and a plane vPlane Contains plane coefficients (abcd) where ax + by + cz = d vPoint Point to be tested against the plane float DistanceToPlane( float4 vPlane float3 vPoint ) return dot( float4( vPoint -1 ) vPlane ) Frustum cullling on a sphere Returns 1 if visible 0 otherwise float CullSphere( float4 vPlanes[6] float3 vCenter float fRadius ) for( uint i=0 ilt6 i++ ) entire sphere is outside one of the six planes cull immediately if( DistanceToPlane( vPlanes[i] vCenter ) gt fRadius ) return 0 return 1 float4 vFrustumPlanes[6] view-frustum planes in world space (normals face out) float3 vSphereCenter bounding sphere center relative to character origin float fSphereRadius bounding sphere radius VSOutput VS( VSInput i ) compute bounding sphere center in world space float3 vObjectPosWS = ivPositionAndGroupxyz float3 vSphereCenterWS = vBoundingSphereCenterxyz + vObjectPosWS perform view-frustum test float fVisible = CullSphere( vFrustumPlanes vSphereCenterWS fSphereRadius ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionStateAndTime = ivDirectionStateAndTime ofVisible = fVisible return o
Listing 2 Vertex shader for view-frustum culling
Chapter 3 March of the Froblins
67|P a g e
333OcclusionCullingWe can also perform occlusion culling in this framework to avoid rendering characters which arecompletelyoccludedbymountainsorstructuresBecauseweareperformingourcharactermanagementontheGPUweareabletoperformocclusionculling inanovelwaybytakingadvantageofthedepthinformationthatexists inthehardwareZbuffer Thisapproachrequiresfar lessCPUoverheadthananapproachbasedonpredicatedrenderingorocclusionquerieswhilestillallowingcullingagainstarbitrarydynamicoccluders Ourapproach issimilar inspirittothehierarchicalZtestingthat is implemented inmodernGPUsandwasinspiredbytheworkof[GKM93]whousedahierarchicaldepthimagecombinedwithanoctreetoculloccludedgeometryinbulkAfter rendering allof theoccluders in the scenewe construct ahierarchicaldepth image from the Zbufferwhichwewill refer toasaHiͲZmap TheHiͲZmap isamipͲmappedscreenͲresolution imagewhereeachtexelinmiplevelicontainsthemaximumdepthofallcorrespondingtexelsinmipleveliͲ1InthemostdetailedmipleveleachtexelsimplycontainsthecorrespondingdepthvaluefromtheZbufferThis depth information can be collected during themain rendering pass for the occluding objects aseparatedepthpassisnotrequiredtobuildtheHiͲZmapAfter construction of the HiͲZ map occlusion culling can be performed by examining the depthinformation forpixelswhicharecoveredbyanobjectrsquosboundingsphereandcomparingthemaximumfetcheddepthtotheprojecteddepthofapointonthespherethat isnearesttothecamera Althoughthisapproachdoesnotprovideanexactocclusiontestitgivesaconservativeestimatethatworkswellinmanycasesandwillneverresultinfalseculling3331HiͲZMapConstructionForsingleͲsamplerenderingonecanusetheHiͲZmapasthemaindepthbufferforrenderingthescene(usingaDepthStencilviewofthefirstmiplevel)InDirect3Dreg101multiͲsampleddepthbufferscanalsobesupportedwithanextrafullͲscreenquadpassbyfirstcomputingthemaximumdepthofeachpixelrsquossubͲsamplesandstoringtheresultinthelowestleveloftheHiͲZmapSubsequentlevelsaregeneratedusingasequenceofreductionpasseswhichrepeatedlyfetchtexelsandcomputetheirmaximumasshown inListing3 BecausescreenͲsized imagestypicallydonotmipwellcaremust be taken when reducing oddͲsizedmip levels In this case the pixels on the oddͲsizedboundaryedgemustfetchadditionaltexelstoensurethattheirdepthvaluesaretakenintoaccountInaddition it isnecessarytouse integercalculations forthetextureaddressarithmeticbecause floatingͲpointerrorcanresultinincorrectaddressingwhenrenderingintothelowermiplevelsEachofthereductionpassesrenders intoonemip leveloftheHiͲZmapresourcewhilesamplingfromthepreviousoneThisisvalidapproachinDirect3Dreg10aslongastheresourceviewusedfortheinputmip level does not overlap the one being used for output (different input and output viewsmust becreatedforeachpass)
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
68|P a g e
struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM
Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control
Chapter 3 March of the Froblins
69|P a g e
betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula
ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)
HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ
ௗ ୲ୟ୬ቀഇమቁ (9)
whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat
ቀௐଶቁ ͳ (10)
wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
70|P a g e
ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD
Chapter 3 March of the Froblins
71|P a g e
float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o
Listing 4 Vertex shader for per-instance occlusion culling
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
72|P a g e
335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands
RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )
Listing 5 Summary of our character management system
C
3 TcsftctmOdttbaipo
FccoadTmaofobt
Chapter 3 Ma
34 Cha
The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea
Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation
Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu
The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon
arch of the Fr
racterAn
nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa
canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in
xample of difgic shader
e and desired(b) User pl
we see the cushrooms (d)
of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence
oblins
nimation
h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto
ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp
ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe
mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes
ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU
redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th
ons that our determines tom the left ious poison ng away fro
eaceful restin
is illustratewherethehober isusedtouse the teeanimationweight annotableperfo
med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour
ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor
Froblins cathe current a(a) Froblin cloud in the
om the hazarng restores t
ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai
dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes
kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res
an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo
e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto
s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr
miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi
These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri
ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic
7
ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou
c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p
ns are controsed on each ed treasure tas a result bout to munts
ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo
73|P a g e
mationsandonbyvertexkthebonesfcharacters
merousdrawnd33)theursystemby
fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and
olled by the characterrsquos to the drop-they scatter
nch on some
ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
74|P a g e
Figure 8 Animation texture layout
Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss
float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering
Bone 1
Matrix Row 0
Matrix Row 1
Matrix Row 2
Bone 0
Matrix Row 0
Matrix Row 1
Matrix Row 2
Key 0 Key1 Key N
RGBA FP16
Chapter 3 March of the Froblins
75|P a g e
void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)
Listing 6 Shader code to fetch interpolate and blend bone animations
AN
7
3
F(st
Rsat(stfs
T
AdvancesinReNTatarchuk(E
76|P a g e
35 Tess
Figure 9 Ou(left) On theshaders and the tessellate
Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder
FroblLowr
Zbrushreghighr
Table 1 Com
ealͲTimeRendeEditor)
sellation
ur system alle right the textures W
ed character
erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof
lincontrolcagesolutionmod
resolutionFro
mparison of
eringin3DGra
andCrow
lows rendersame chara
While using thr on the left
GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste
edel
blinmodel
memory foo
aphicsandGam
wdRende
ing characteacter is rendhe same memwhereas the
cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo
tprint for hig
mesCoursendashS
ering
ers with extrdered withomory footprie low resolut
asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke
Polygons
5160faces
15+Mfaces
gh and low r
IGGRAPH2008
reme details ut the use oint we are ation characte
60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka
resolution m
8
in close-up of tessellatioable to add er has much
RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf
Vertexa
2Ktimes2K16b
Vert
Ind
models for ou
when using on using idehigh level ofcoarser silh
D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption
TotalMemory
ndindexbuffe
itdisplacemen
texbuffer~27
dexBuffer180
ur main char
tessellationentical pixel f details for
houettes
0and4000fied shaderdhardwareirect3Dreg11universally
ssellationasseauthoredormanceA
y
ers100KB
ntmap10MB
70MB
0MB
racter
C
Fc
Chapter 3 Ma
Figure 10 Ccharacter
arch of the Fr
Comparison
oblins
of low resolution modeel (top) and hhigh resoluttion model (
7
(bottom) for
77|P a g e
the Froblin
AN
7
Hvtcodwotddc[
3
Ihho
F(vd
Gt1g
AdvancesinReNTatarchuk(E
78|P a g e
Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08
351 GPU
n this sectiohardwareashardware fixoutlinedinF
Figure 11 A(also referrevertices thusdisplacemen
Goingthrougtakesaninp15X times fgenerationo
CoarseS
ealͲTimeRendeEditor)
essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]
UTessellati
onwewillpsusedinouxedͲfunctionigure11
An overviewed to as the s amplifyingt obtaining
ghtheproceutprimitiveor Xboxreg 3ofgraphicsc
SuperͲPrimMesh
eringin3DGra
provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional
ionPipelin
provideanorsystemWn tessellator
w of the tesseldquocontrol cagg the input mthe final tess
essforasing(whichwe60 or ATI Rcards)Aver
Tessella
aphicsandGam
veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of
ne
overviewofWedesigned
unitavailab
ellation procgerdquo or ldquothe smesh The vesellated and
gleinputpolrefertoasaRadeonreg HDrtexshader
ator
mesCoursendashS
nefits speciis effective
ologyparamowamountorenderingooverallamodisplaceme
owsustoreddvideomemhe memoryrce Table 1f the benef
GPU tesselanAPIforableonrecen
cess We stasuper-primitertex shader
d displaced h
ygonwehaasuperͲprimD 2000Ͳ4000(whichwer
Tessellated Mesh
IGGRAPH2008
fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are
1demonstrafits provided
lationpipeliaGPUtesselntconsumer
art by rendertive meshrdquo)r is used to high resolutio
avethefollomitive)anda0 GPU generefertoasa
h
Di
8
al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell
ineavailablelationpipelirGPUsThe
ring a coars The tessellaevaluate suron mesh see
wingthehaamplifiesiterations ornevaluation
Evaluatesurfacepositions
Addsplacement
active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw
ducingtheoy relevant fy savings foation can b
eoncurreninetakingade tessellation
se low resoator unit genrface position on the righ
ardwaretess(upto411t64X for Di
nshader)is
TessellatedanMes
ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor
widthThisisverallgamefor consoleorourmainbe found in
tconsumerdvantageofnprocess is
lution mesh nerates new ons and add ht
sellatorunittrianglesorrect3Dreg 11invokedfor
dDisplacedsh
d
Chapter 3 March of the Froblins
79|P a g e
eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
80|P a g e
struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS
C
L GTHsc
Fdb Hnawddeae
Chapter 3 Ma
f f f f o o o o o r
Listing 7 Ex
GiventhatwTraditionallyHoweverthspacenormachangingthe
Figure 12 Ddisplaced in be shaded us
Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese
arch of the Fr
Convert po translatin somewhat f
float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor
ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS
return o
xample simp
wearerendey animatedereexistsaalmapsInteactualdisp
Displacementhe directio
sing normal
e intend toordertocommely thatwo generatentandnormantmapmustthe tangen
MDGPUMeserequireme
oblins
osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota
S = mul( mVPS = float4( v = vNormalWS = vTangentW
S = vBinormal
le evaluation
eringourchcharactersconcernwithatcasewelacednorma
nt of the veon of geometԢ
light thedismbineTSNMwearecomthe displa
almapsalsotbe generantͲspace norhMappertontsaremet
tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir
float4( vPovPositionWS S WS lWS
n shader for
aracterwithare rendehregardstoeareessental(asshown
rtex modifietric normal
splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr
e from objectrotating abo
vPositionOS vNormalOS )vTangentOS )vBinormalOS
ositionWS 1 1 )
r rendering i
hdisplacemered and litousingdisplatiallygeneratninFigure12
es the norma displacem
faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour
t space to woout y we can
) + vInstanc ) )
) )
nstanced tes
entmapweusing tang
acementmatinganewt2)
al used for ment amount
angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor
orld space byn simplify th
cePosWS
ssellated cha
emustmakegentͲspaceappingwhentangentfram
rendering
t D The resu
cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa
8
y rotating anhe math
aracters
eanoteabonormal manlightingusimeasweare
P is the oriulting point
mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat
81|P a g e
nd
outlightingps (TSNM)ngtangentͲedisplacing
iginal point Prsquo needs to
heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan
t
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
82|P a g e
353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows
ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ
HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive
353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed
Chapter 3 March of the Froblins
83|P a g e
vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
66|P a g e
332ViewͲFrustumCullingIt is straightforward to perform view frustum culling using a filtering geometry shader as describedabove ForviewͲfrustumcullingthevertexshadersimplyperformsan intersectioncheckbetweenthecharacterboundingvolumeandtheviewfrustumusingtheusualalgorithms(forexample[AHH08])Ifthe testpasses then the corresponding character is visible and its instancedata isemitted from thegeometryshaderandstreamedoutOtherwiseitisdiscardedAnexampleofacullingvertexshaderisgiveninListing2
struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirectionStateAndTime DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible TestResult Computes signed distance between a point and a plane vPlane Contains plane coefficients (abcd) where ax + by + cz = d vPoint Point to be tested against the plane float DistanceToPlane( float4 vPlane float3 vPoint ) return dot( float4( vPoint -1 ) vPlane ) Frustum cullling on a sphere Returns 1 if visible 0 otherwise float CullSphere( float4 vPlanes[6] float3 vCenter float fRadius ) for( uint i=0 ilt6 i++ ) entire sphere is outside one of the six planes cull immediately if( DistanceToPlane( vPlanes[i] vCenter ) gt fRadius ) return 0 return 1 float4 vFrustumPlanes[6] view-frustum planes in world space (normals face out) float3 vSphereCenter bounding sphere center relative to character origin float fSphereRadius bounding sphere radius VSOutput VS( VSInput i ) compute bounding sphere center in world space float3 vObjectPosWS = ivPositionAndGroupxyz float3 vSphereCenterWS = vBoundingSphereCenterxyz + vObjectPosWS perform view-frustum test float fVisible = CullSphere( vFrustumPlanes vSphereCenterWS fSphereRadius ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionStateAndTime = ivDirectionStateAndTime ofVisible = fVisible return o
Listing 2 Vertex shader for view-frustum culling
Chapter 3 March of the Froblins
67|P a g e
333OcclusionCullingWe can also perform occlusion culling in this framework to avoid rendering characters which arecompletelyoccludedbymountainsorstructuresBecauseweareperformingourcharactermanagementontheGPUweareabletoperformocclusionculling inanovelwaybytakingadvantageofthedepthinformationthatexists inthehardwareZbuffer Thisapproachrequiresfar lessCPUoverheadthananapproachbasedonpredicatedrenderingorocclusionquerieswhilestillallowingcullingagainstarbitrarydynamicoccluders Ourapproach issimilar inspirittothehierarchicalZtestingthat is implemented inmodernGPUsandwasinspiredbytheworkof[GKM93]whousedahierarchicaldepthimagecombinedwithanoctreetoculloccludedgeometryinbulkAfter rendering allof theoccluders in the scenewe construct ahierarchicaldepth image from the Zbufferwhichwewill refer toasaHiͲZmap TheHiͲZmap isamipͲmappedscreenͲresolution imagewhereeachtexelinmiplevelicontainsthemaximumdepthofallcorrespondingtexelsinmipleveliͲ1InthemostdetailedmipleveleachtexelsimplycontainsthecorrespondingdepthvaluefromtheZbufferThis depth information can be collected during themain rendering pass for the occluding objects aseparatedepthpassisnotrequiredtobuildtheHiͲZmapAfter construction of the HiͲZ map occlusion culling can be performed by examining the depthinformation forpixelswhicharecoveredbyanobjectrsquosboundingsphereandcomparingthemaximumfetcheddepthtotheprojecteddepthofapointonthespherethat isnearesttothecamera Althoughthisapproachdoesnotprovideanexactocclusiontestitgivesaconservativeestimatethatworkswellinmanycasesandwillneverresultinfalseculling3331HiͲZMapConstructionForsingleͲsamplerenderingonecanusetheHiͲZmapasthemaindepthbufferforrenderingthescene(usingaDepthStencilviewofthefirstmiplevel)InDirect3Dreg101multiͲsampleddepthbufferscanalsobesupportedwithanextrafullͲscreenquadpassbyfirstcomputingthemaximumdepthofeachpixelrsquossubͲsamplesandstoringtheresultinthelowestleveloftheHiͲZmapSubsequentlevelsaregeneratedusingasequenceofreductionpasseswhichrepeatedlyfetchtexelsandcomputetheirmaximumasshown inListing3 BecausescreenͲsized imagestypicallydonotmipwellcaremust be taken when reducing oddͲsizedmip levels In this case the pixels on the oddͲsizedboundaryedgemustfetchadditionaltexelstoensurethattheirdepthvaluesaretakenintoaccountInaddition it isnecessarytouse integercalculations forthetextureaddressarithmeticbecause floatingͲpointerrorcanresultinincorrectaddressingwhenrenderingintothelowermiplevelsEachofthereductionpassesrenders intoonemip leveloftheHiͲZmapresourcewhilesamplingfromthepreviousoneThisisvalidapproachinDirect3Dreg10aslongastheresourceviewusedfortheinputmip level does not overlap the one being used for output (different input and output viewsmust becreatedforeachpass)
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
68|P a g e
struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM
Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control
Chapter 3 March of the Froblins
69|P a g e
betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula
ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)
HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ
ௗ ୲ୟ୬ቀഇమቁ (9)
whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat
ቀௐଶቁ ͳ (10)
wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
70|P a g e
ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD
Chapter 3 March of the Froblins
71|P a g e
float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o
Listing 4 Vertex shader for per-instance occlusion culling
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
72|P a g e
335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands
RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )
Listing 5 Summary of our character management system
C
3 TcsftctmOdttbaipo
FccoadTmaofobt
Chapter 3 Ma
34 Cha
The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea
Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation
Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu
The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon
arch of the Fr
racterAn
nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa
canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in
xample of difgic shader
e and desired(b) User pl
we see the cushrooms (d)
of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence
oblins
nimation
h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto
ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp
ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe
mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes
ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU
redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th
ons that our determines tom the left ious poison ng away fro
eaceful restin
is illustratewherethehober isusedtouse the teeanimationweight annotableperfo
med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour
ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor
Froblins cathe current a(a) Froblin cloud in the
om the hazarng restores t
ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai
dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes
kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res
an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo
e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto
s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr
miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi
These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri
ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic
7
ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou
c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p
ns are controsed on each ed treasure tas a result bout to munts
ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo
73|P a g e
mationsandonbyvertexkthebonesfcharacters
merousdrawnd33)theursystemby
fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and
olled by the characterrsquos to the drop-they scatter
nch on some
ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
74|P a g e
Figure 8 Animation texture layout
Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss
float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering
Bone 1
Matrix Row 0
Matrix Row 1
Matrix Row 2
Bone 0
Matrix Row 0
Matrix Row 1
Matrix Row 2
Key 0 Key1 Key N
RGBA FP16
Chapter 3 March of the Froblins
75|P a g e
void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)
Listing 6 Shader code to fetch interpolate and blend bone animations
AN
7
3
F(st
Rsat(stfs
T
AdvancesinReNTatarchuk(E
76|P a g e
35 Tess
Figure 9 Ou(left) On theshaders and the tessellate
Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder
FroblLowr
Zbrushreghighr
Table 1 Com
ealͲTimeRendeEditor)
sellation
ur system alle right the textures W
ed character
erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof
lincontrolcagesolutionmod
resolutionFro
mparison of
eringin3DGra
andCrow
lows rendersame chara
While using thr on the left
GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste
edel
blinmodel
memory foo
aphicsandGam
wdRende
ing characteacter is rendhe same memwhereas the
cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo
tprint for hig
mesCoursendashS
ering
ers with extrdered withomory footprie low resolut
asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke
Polygons
5160faces
15+Mfaces
gh and low r
IGGRAPH2008
reme details ut the use oint we are ation characte
60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka
resolution m
8
in close-up of tessellatioable to add er has much
RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf
Vertexa
2Ktimes2K16b
Vert
Ind
models for ou
when using on using idehigh level ofcoarser silh
D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption
TotalMemory
ndindexbuffe
itdisplacemen
texbuffer~27
dexBuffer180
ur main char
tessellationentical pixel f details for
houettes
0and4000fied shaderdhardwareirect3Dreg11universally
ssellationasseauthoredormanceA
y
ers100KB
ntmap10MB
70MB
0MB
racter
C
Fc
Chapter 3 Ma
Figure 10 Ccharacter
arch of the Fr
Comparison
oblins
of low resolution modeel (top) and hhigh resoluttion model (
7
(bottom) for
77|P a g e
the Froblin
AN
7
Hvtcodwotddc[
3
Ihho
F(vd
Gt1g
AdvancesinReNTatarchuk(E
78|P a g e
Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08
351 GPU
n this sectiohardwareashardware fixoutlinedinF
Figure 11 A(also referrevertices thusdisplacemen
Goingthrougtakesaninp15X times fgenerationo
CoarseS
ealͲTimeRendeEditor)
essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]
UTessellati
onwewillpsusedinouxedͲfunctionigure11
An overviewed to as the s amplifyingt obtaining
ghtheproceutprimitiveor Xboxreg 3ofgraphicsc
SuperͲPrimMesh
eringin3DGra
provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional
ionPipelin
provideanorsystemWn tessellator
w of the tesseldquocontrol cagg the input mthe final tess
essforasing(whichwe60 or ATI Rcards)Aver
Tessella
aphicsandGam
veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of
ne
overviewofWedesigned
unitavailab
ellation procgerdquo or ldquothe smesh The vesellated and
gleinputpolrefertoasaRadeonreg HDrtexshader
ator
mesCoursendashS
nefits speciis effective
ologyparamowamountorenderingooverallamodisplaceme
owsustoreddvideomemhe memoryrce Table 1f the benef
GPU tesselanAPIforableonrecen
cess We stasuper-primitertex shader
d displaced h
ygonwehaasuperͲprimD 2000Ͳ4000(whichwer
Tessellated Mesh
IGGRAPH2008
fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are
1demonstrafits provided
lationpipeliaGPUtesselntconsumer
art by rendertive meshrdquo)r is used to high resolutio
avethefollomitive)anda0 GPU generefertoasa
h
Di
8
al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell
ineavailablelationpipelirGPUsThe
ring a coars The tessellaevaluate suron mesh see
wingthehaamplifiesiterations ornevaluation
Evaluatesurfacepositions
Addsplacement
active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw
ducingtheoy relevant fy savings foation can b
eoncurreninetakingade tessellation
se low resoator unit genrface position on the righ
ardwaretess(upto411t64X for Di
nshader)is
TessellatedanMes
ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor
widthThisisverallgamefor consoleorourmainbe found in
tconsumerdvantageofnprocess is
lution mesh nerates new ons and add ht
sellatorunittrianglesorrect3Dreg 11invokedfor
dDisplacedsh
d
Chapter 3 March of the Froblins
79|P a g e
eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
80|P a g e
struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS
C
L GTHsc
Fdb Hnawddeae
Chapter 3 Ma
f f f f o o o o o r
Listing 7 Ex
GiventhatwTraditionallyHoweverthspacenormachangingthe
Figure 12 Ddisplaced in be shaded us
Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese
arch of the Fr
Convert po translatin somewhat f
float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor
ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS
return o
xample simp
wearerendey animatedereexistsaalmapsInteactualdisp
Displacementhe directio
sing normal
e intend toordertocommely thatwo generatentandnormantmapmustthe tangen
MDGPUMeserequireme
oblins
osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota
S = mul( mVPS = float4( v = vNormalWS = vTangentW
S = vBinormal
le evaluation
eringourchcharactersconcernwithatcasewelacednorma
nt of the veon of geometԢ
light thedismbineTSNMwearecomthe displa
almapsalsotbe generantͲspace norhMappertontsaremet
tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir
float4( vPovPositionWS S WS lWS
n shader for
aracterwithare rendehregardstoeareessental(asshown
rtex modifietric normal
splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr
e from objectrotating abo
vPositionOS vNormalOS )vTangentOS )vBinormalOS
ositionWS 1 1 )
r rendering i
hdisplacemered and litousingdisplatiallygeneratninFigure12
es the norma displacem
faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour
t space to woout y we can
) + vInstanc ) )
) )
nstanced tes
entmapweusing tang
acementmatinganewt2)
al used for ment amount
angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor
orld space byn simplify th
cePosWS
ssellated cha
emustmakegentͲspaceappingwhentangentfram
rendering
t D The resu
cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa
8
y rotating anhe math
aracters
eanoteabonormal manlightingusimeasweare
P is the oriulting point
mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat
81|P a g e
nd
outlightingps (TSNM)ngtangentͲedisplacing
iginal point Prsquo needs to
heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan
t
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
82|P a g e
353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows
ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ
HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive
353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed
Chapter 3 March of the Froblins
83|P a g e
vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
67|P a g e
333OcclusionCullingWe can also perform occlusion culling in this framework to avoid rendering characters which arecompletelyoccludedbymountainsorstructuresBecauseweareperformingourcharactermanagementontheGPUweareabletoperformocclusionculling inanovelwaybytakingadvantageofthedepthinformationthatexists inthehardwareZbuffer Thisapproachrequiresfar lessCPUoverheadthananapproachbasedonpredicatedrenderingorocclusionquerieswhilestillallowingcullingagainstarbitrarydynamicoccluders Ourapproach issimilar inspirittothehierarchicalZtestingthat is implemented inmodernGPUsandwasinspiredbytheworkof[GKM93]whousedahierarchicaldepthimagecombinedwithanoctreetoculloccludedgeometryinbulkAfter rendering allof theoccluders in the scenewe construct ahierarchicaldepth image from the Zbufferwhichwewill refer toasaHiͲZmap TheHiͲZmap isamipͲmappedscreenͲresolution imagewhereeachtexelinmiplevelicontainsthemaximumdepthofallcorrespondingtexelsinmipleveliͲ1InthemostdetailedmipleveleachtexelsimplycontainsthecorrespondingdepthvaluefromtheZbufferThis depth information can be collected during themain rendering pass for the occluding objects aseparatedepthpassisnotrequiredtobuildtheHiͲZmapAfter construction of the HiͲZ map occlusion culling can be performed by examining the depthinformation forpixelswhicharecoveredbyanobjectrsquosboundingsphereandcomparingthemaximumfetcheddepthtotheprojecteddepthofapointonthespherethat isnearesttothecamera Althoughthisapproachdoesnotprovideanexactocclusiontestitgivesaconservativeestimatethatworkswellinmanycasesandwillneverresultinfalseculling3331HiͲZMapConstructionForsingleͲsamplerenderingonecanusetheHiͲZmapasthemaindepthbufferforrenderingthescene(usingaDepthStencilviewofthefirstmiplevel)InDirect3Dreg101multiͲsampleddepthbufferscanalsobesupportedwithanextrafullͲscreenquadpassbyfirstcomputingthemaximumdepthofeachpixelrsquossubͲsamplesandstoringtheresultinthelowestleveloftheHiͲZmapSubsequentlevelsaregeneratedusingasequenceofreductionpasseswhichrepeatedlyfetchtexelsandcomputetheirmaximumasshown inListing3 BecausescreenͲsized imagestypicallydonotmipwellcaremust be taken when reducing oddͲsizedmip levels In this case the pixels on the oddͲsizedboundaryedgemustfetchadditionaltexelstoensurethattheirdepthvaluesaretakenintoaccountInaddition it isnecessarytouse integercalculations forthetextureaddressarithmeticbecause floatingͲpointerrorcanresultinincorrectaddressingwhenrenderingintothelowermiplevelsEachofthereductionpassesrenders intoonemip leveloftheHiͲZmapresourcewhilesamplingfromthepreviousoneThisisvalidapproachinDirect3Dreg10aslongastheresourceviewusedfortheinputmip level does not overlap the one being used for output (different input and output viewsmust becreatedforeachpass)
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
68|P a g e
struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM
Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control
Chapter 3 March of the Froblins
69|P a g e
betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula
ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)
HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ
ௗ ୲ୟ୬ቀഇమቁ (9)
whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat
ቀௐଶቁ ͳ (10)
wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
70|P a g e
ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD
Chapter 3 March of the Froblins
71|P a g e
float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o
Listing 4 Vertex shader for per-instance occlusion culling
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
72|P a g e
335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands
RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )
Listing 5 Summary of our character management system
C
3 TcsftctmOdttbaipo
FccoadTmaofobt
Chapter 3 Ma
34 Cha
The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea
Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation
Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu
The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon
arch of the Fr
racterAn
nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa
canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in
xample of difgic shader
e and desired(b) User pl
we see the cushrooms (d)
of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence
oblins
nimation
h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto
ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp
ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe
mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes
ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU
redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th
ons that our determines tom the left ious poison ng away fro
eaceful restin
is illustratewherethehober isusedtouse the teeanimationweight annotableperfo
med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour
ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor
Froblins cathe current a(a) Froblin cloud in the
om the hazarng restores t
ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai
dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes
kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res
an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo
e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto
s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr
miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi
These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri
ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic
7
ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou
c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p
ns are controsed on each ed treasure tas a result bout to munts
ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo
73|P a g e
mationsandonbyvertexkthebonesfcharacters
merousdrawnd33)theursystemby
fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and
olled by the characterrsquos to the drop-they scatter
nch on some
ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
74|P a g e
Figure 8 Animation texture layout
Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss
float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering
Bone 1
Matrix Row 0
Matrix Row 1
Matrix Row 2
Bone 0
Matrix Row 0
Matrix Row 1
Matrix Row 2
Key 0 Key1 Key N
RGBA FP16
Chapter 3 March of the Froblins
75|P a g e
void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)
Listing 6 Shader code to fetch interpolate and blend bone animations
AN
7
3
F(st
Rsat(stfs
T
AdvancesinReNTatarchuk(E
76|P a g e
35 Tess
Figure 9 Ou(left) On theshaders and the tessellate
Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder
FroblLowr
Zbrushreghighr
Table 1 Com
ealͲTimeRendeEditor)
sellation
ur system alle right the textures W
ed character
erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof
lincontrolcagesolutionmod
resolutionFro
mparison of
eringin3DGra
andCrow
lows rendersame chara
While using thr on the left
GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste
edel
blinmodel
memory foo
aphicsandGam
wdRende
ing characteacter is rendhe same memwhereas the
cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo
tprint for hig
mesCoursendashS
ering
ers with extrdered withomory footprie low resolut
asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke
Polygons
5160faces
15+Mfaces
gh and low r
IGGRAPH2008
reme details ut the use oint we are ation characte
60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka
resolution m
8
in close-up of tessellatioable to add er has much
RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf
Vertexa
2Ktimes2K16b
Vert
Ind
models for ou
when using on using idehigh level ofcoarser silh
D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption
TotalMemory
ndindexbuffe
itdisplacemen
texbuffer~27
dexBuffer180
ur main char
tessellationentical pixel f details for
houettes
0and4000fied shaderdhardwareirect3Dreg11universally
ssellationasseauthoredormanceA
y
ers100KB
ntmap10MB
70MB
0MB
racter
C
Fc
Chapter 3 Ma
Figure 10 Ccharacter
arch of the Fr
Comparison
oblins
of low resolution modeel (top) and hhigh resoluttion model (
7
(bottom) for
77|P a g e
the Froblin
AN
7
Hvtcodwotddc[
3
Ihho
F(vd
Gt1g
AdvancesinReNTatarchuk(E
78|P a g e
Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08
351 GPU
n this sectiohardwareashardware fixoutlinedinF
Figure 11 A(also referrevertices thusdisplacemen
Goingthrougtakesaninp15X times fgenerationo
CoarseS
ealͲTimeRendeEditor)
essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]
UTessellati
onwewillpsusedinouxedͲfunctionigure11
An overviewed to as the s amplifyingt obtaining
ghtheproceutprimitiveor Xboxreg 3ofgraphicsc
SuperͲPrimMesh
eringin3DGra
provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional
ionPipelin
provideanorsystemWn tessellator
w of the tesseldquocontrol cagg the input mthe final tess
essforasing(whichwe60 or ATI Rcards)Aver
Tessella
aphicsandGam
veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of
ne
overviewofWedesigned
unitavailab
ellation procgerdquo or ldquothe smesh The vesellated and
gleinputpolrefertoasaRadeonreg HDrtexshader
ator
mesCoursendashS
nefits speciis effective
ologyparamowamountorenderingooverallamodisplaceme
owsustoreddvideomemhe memoryrce Table 1f the benef
GPU tesselanAPIforableonrecen
cess We stasuper-primitertex shader
d displaced h
ygonwehaasuperͲprimD 2000Ͳ4000(whichwer
Tessellated Mesh
IGGRAPH2008
fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are
1demonstrafits provided
lationpipeliaGPUtesselntconsumer
art by rendertive meshrdquo)r is used to high resolutio
avethefollomitive)anda0 GPU generefertoasa
h
Di
8
al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell
ineavailablelationpipelirGPUsThe
ring a coars The tessellaevaluate suron mesh see
wingthehaamplifiesiterations ornevaluation
Evaluatesurfacepositions
Addsplacement
active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw
ducingtheoy relevant fy savings foation can b
eoncurreninetakingade tessellation
se low resoator unit genrface position on the righ
ardwaretess(upto411t64X for Di
nshader)is
TessellatedanMes
ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor
widthThisisverallgamefor consoleorourmainbe found in
tconsumerdvantageofnprocess is
lution mesh nerates new ons and add ht
sellatorunittrianglesorrect3Dreg 11invokedfor
dDisplacedsh
d
Chapter 3 March of the Froblins
79|P a g e
eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
80|P a g e
struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS
C
L GTHsc
Fdb Hnawddeae
Chapter 3 Ma
f f f f o o o o o r
Listing 7 Ex
GiventhatwTraditionallyHoweverthspacenormachangingthe
Figure 12 Ddisplaced in be shaded us
Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese
arch of the Fr
Convert po translatin somewhat f
float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor
ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS
return o
xample simp
wearerendey animatedereexistsaalmapsInteactualdisp
Displacementhe directio
sing normal
e intend toordertocommely thatwo generatentandnormantmapmustthe tangen
MDGPUMeserequireme
oblins
osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota
S = mul( mVPS = float4( v = vNormalWS = vTangentW
S = vBinormal
le evaluation
eringourchcharactersconcernwithatcasewelacednorma
nt of the veon of geometԢ
light thedismbineTSNMwearecomthe displa
almapsalsotbe generantͲspace norhMappertontsaremet
tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir
float4( vPovPositionWS S WS lWS
n shader for
aracterwithare rendehregardstoeareessental(asshown
rtex modifietric normal
splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr
e from objectrotating abo
vPositionOS vNormalOS )vTangentOS )vBinormalOS
ositionWS 1 1 )
r rendering i
hdisplacemered and litousingdisplatiallygeneratninFigure12
es the norma displacem
faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour
t space to woout y we can
) + vInstanc ) )
) )
nstanced tes
entmapweusing tang
acementmatinganewt2)
al used for ment amount
angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor
orld space byn simplify th
cePosWS
ssellated cha
emustmakegentͲspaceappingwhentangentfram
rendering
t D The resu
cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa
8
y rotating anhe math
aracters
eanoteabonormal manlightingusimeasweare
P is the oriulting point
mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat
81|P a g e
nd
outlightingps (TSNM)ngtangentͲedisplacing
iginal point Prsquo needs to
heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan
t
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
82|P a g e
353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows
ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ
HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive
353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed
Chapter 3 March of the Froblins
83|P a g e
vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
68|P a g e
struct PSInput Fractional pixel coordinates (05 15 25 etchellip) float4 vPositionSS SV_POSITION Dimensions of lsquotCurrentMiprsquo Can be obtained by calling lsquoGetDimensionsrsquo in the vertex shader nointerpolation uint2 vLastMipSize DIMENSION Texture2Dltfloatgt tCurrentMip sampler sPoint float4 main( PSInput i ) SV_TARGET get integer pixel coordinates uint3 nCoords = uint3( ivPositionSSxy 0 ) uint2 vLastMipSize = ivLastMipSize fetch a 2x2 neighborhood and compute the max nCoordsxy = 2 float4 vTexels vTexelsx = tCurrentMipLoad( nCoords ) vTexelsy = tCurrentMipLoad( nCoords uint2(10) ) vTexelsz = tCurrentMipLoad( nCoords uint2(01) ) vTexelsw = tCurrentMipLoad( nCoords uint2(11) ) float fM = max( max( vTexelsx vTexelsy ) max( vTexelszvTexelsw ) ) if we are reducing an odd-sized texture then the edge pixels need to fetch additional texels float2 vExtra if( (vLastMipSizex amp 1) ampamp nCoordsx == vLastMipSizex-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(20) ) vExtray = tCurrentMipLoad( nCoords uint2(21) ) fM = max( fM max( vExtrax vExtray ) ) if( (vLastMipSizey amp 1) ampamp nCoordsy == vLastMipSizey-3 ) vExtrax = tCurrentMipLoad( nCoords uint2(02) ) vExtray = tCurrentMipLoad( nCoords uint2(12) ) fM = max( fM max( vExtrax vExtray ) ) extreme case If both edges are odd fetch the bottom-right corner texel if( ( ( vLastMipSizex amp 1 ) ampamp ( vLastMipSizey amp 1 ) ) ampamp nCoordsx == vLastMipSizex-3 ampamp nCoordsy == vLastMipSizey-3 ) fM = max( fM tCurrentMipLoad( nCoords uint2(22) ) ) return fM
Listing 3 Pixel shader used for Hi-Z map construction 3332CullingwiththeHiͲZMapOnce we have constructed the HiͲZmap we perform another stream filtering pass which uses thisinformationtoperformocclusioncullingInordertoensureastableframerateitisdesirabletorestrictthe number of fetches that are performed for each character and to avoid divergent flow control
Chapter 3 March of the Froblins
69|P a g e
betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula
ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)
HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ
ௗ ୲ୟ୬ቀഇమቁ (9)
whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat
ቀௐଶቁ ͳ (10)
wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
70|P a g e
ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD
Chapter 3 March of the Froblins
71|P a g e
float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o
Listing 4 Vertex shader for per-instance occlusion culling
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
72|P a g e
335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands
RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )
Listing 5 Summary of our character management system
C
3 TcsftctmOdttbaipo
FccoadTmaofobt
Chapter 3 Ma
34 Cha
The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea
Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation
Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu
The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon
arch of the Fr
racterAn
nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa
canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in
xample of difgic shader
e and desired(b) User pl
we see the cushrooms (d)
of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence
oblins
nimation
h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto
ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp
ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe
mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes
ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU
redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th
ons that our determines tom the left ious poison ng away fro
eaceful restin
is illustratewherethehober isusedtouse the teeanimationweight annotableperfo
med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour
ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor
Froblins cathe current a(a) Froblin cloud in the
om the hazarng restores t
ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai
dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes
kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res
an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo
e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto
s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr
miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi
These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri
ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic
7
ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou
c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p
ns are controsed on each ed treasure tas a result bout to munts
ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo
73|P a g e
mationsandonbyvertexkthebonesfcharacters
merousdrawnd33)theursystemby
fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and
olled by the characterrsquos to the drop-they scatter
nch on some
ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
74|P a g e
Figure 8 Animation texture layout
Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss
float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering
Bone 1
Matrix Row 0
Matrix Row 1
Matrix Row 2
Bone 0
Matrix Row 0
Matrix Row 1
Matrix Row 2
Key 0 Key1 Key N
RGBA FP16
Chapter 3 March of the Froblins
75|P a g e
void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)
Listing 6 Shader code to fetch interpolate and blend bone animations
AN
7
3
F(st
Rsat(stfs
T
AdvancesinReNTatarchuk(E
76|P a g e
35 Tess
Figure 9 Ou(left) On theshaders and the tessellate
Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder
FroblLowr
Zbrushreghighr
Table 1 Com
ealͲTimeRendeEditor)
sellation
ur system alle right the textures W
ed character
erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof
lincontrolcagesolutionmod
resolutionFro
mparison of
eringin3DGra
andCrow
lows rendersame chara
While using thr on the left
GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste
edel
blinmodel
memory foo
aphicsandGam
wdRende
ing characteacter is rendhe same memwhereas the
cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo
tprint for hig
mesCoursendashS
ering
ers with extrdered withomory footprie low resolut
asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke
Polygons
5160faces
15+Mfaces
gh and low r
IGGRAPH2008
reme details ut the use oint we are ation characte
60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka
resolution m
8
in close-up of tessellatioable to add er has much
RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf
Vertexa
2Ktimes2K16b
Vert
Ind
models for ou
when using on using idehigh level ofcoarser silh
D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption
TotalMemory
ndindexbuffe
itdisplacemen
texbuffer~27
dexBuffer180
ur main char
tessellationentical pixel f details for
houettes
0and4000fied shaderdhardwareirect3Dreg11universally
ssellationasseauthoredormanceA
y
ers100KB
ntmap10MB
70MB
0MB
racter
C
Fc
Chapter 3 Ma
Figure 10 Ccharacter
arch of the Fr
Comparison
oblins
of low resolution modeel (top) and hhigh resoluttion model (
7
(bottom) for
77|P a g e
the Froblin
AN
7
Hvtcodwotddc[
3
Ihho
F(vd
Gt1g
AdvancesinReNTatarchuk(E
78|P a g e
Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08
351 GPU
n this sectiohardwareashardware fixoutlinedinF
Figure 11 A(also referrevertices thusdisplacemen
Goingthrougtakesaninp15X times fgenerationo
CoarseS
ealͲTimeRendeEditor)
essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]
UTessellati
onwewillpsusedinouxedͲfunctionigure11
An overviewed to as the s amplifyingt obtaining
ghtheproceutprimitiveor Xboxreg 3ofgraphicsc
SuperͲPrimMesh
eringin3DGra
provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional
ionPipelin
provideanorsystemWn tessellator
w of the tesseldquocontrol cagg the input mthe final tess
essforasing(whichwe60 or ATI Rcards)Aver
Tessella
aphicsandGam
veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of
ne
overviewofWedesigned
unitavailab
ellation procgerdquo or ldquothe smesh The vesellated and
gleinputpolrefertoasaRadeonreg HDrtexshader
ator
mesCoursendashS
nefits speciis effective
ologyparamowamountorenderingooverallamodisplaceme
owsustoreddvideomemhe memoryrce Table 1f the benef
GPU tesselanAPIforableonrecen
cess We stasuper-primitertex shader
d displaced h
ygonwehaasuperͲprimD 2000Ͳ4000(whichwer
Tessellated Mesh
IGGRAPH2008
fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are
1demonstrafits provided
lationpipeliaGPUtesselntconsumer
art by rendertive meshrdquo)r is used to high resolutio
avethefollomitive)anda0 GPU generefertoasa
h
Di
8
al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell
ineavailablelationpipelirGPUsThe
ring a coars The tessellaevaluate suron mesh see
wingthehaamplifiesiterations ornevaluation
Evaluatesurfacepositions
Addsplacement
active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw
ducingtheoy relevant fy savings foation can b
eoncurreninetakingade tessellation
se low resoator unit genrface position on the righ
ardwaretess(upto411t64X for Di
nshader)is
TessellatedanMes
ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor
widthThisisverallgamefor consoleorourmainbe found in
tconsumerdvantageofnprocess is
lution mesh nerates new ons and add ht
sellatorunittrianglesorrect3Dreg 11invokedfor
dDisplacedsh
d
Chapter 3 March of the Froblins
79|P a g e
eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
80|P a g e
struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS
C
L GTHsc
Fdb Hnawddeae
Chapter 3 Ma
f f f f o o o o o r
Listing 7 Ex
GiventhatwTraditionallyHoweverthspacenormachangingthe
Figure 12 Ddisplaced in be shaded us
Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese
arch of the Fr
Convert po translatin somewhat f
float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor
ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS
return o
xample simp
wearerendey animatedereexistsaalmapsInteactualdisp
Displacementhe directio
sing normal
e intend toordertocommely thatwo generatentandnormantmapmustthe tangen
MDGPUMeserequireme
oblins
osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota
S = mul( mVPS = float4( v = vNormalWS = vTangentW
S = vBinormal
le evaluation
eringourchcharactersconcernwithatcasewelacednorma
nt of the veon of geometԢ
light thedismbineTSNMwearecomthe displa
almapsalsotbe generantͲspace norhMappertontsaremet
tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir
float4( vPovPositionWS S WS lWS
n shader for
aracterwithare rendehregardstoeareessental(asshown
rtex modifietric normal
splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr
e from objectrotating abo
vPositionOS vNormalOS )vTangentOS )vBinormalOS
ositionWS 1 1 )
r rendering i
hdisplacemered and litousingdisplatiallygeneratninFigure12
es the norma displacem
faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour
t space to woout y we can
) + vInstanc ) )
) )
nstanced tes
entmapweusing tang
acementmatinganewt2)
al used for ment amount
angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor
orld space byn simplify th
cePosWS
ssellated cha
emustmakegentͲspaceappingwhentangentfram
rendering
t D The resu
cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa
8
y rotating anhe math
aracters
eanoteabonormal manlightingusimeasweare
P is the oriulting point
mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat
81|P a g e
nd
outlightingps (TSNM)ngtangentͲedisplacing
iginal point Prsquo needs to
heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan
t
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
82|P a g e
353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows
ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ
HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive
353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed
Chapter 3 March of the Froblins
83|P a g e
vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
69|P a g e
betweencharacterinstancesWecanaccomplishthisgoalbyexploitingthehierarchicalstructureoftheHiͲZmapWe first compute the bounding square in image spacewhich fully encloses the characterrsquos projectedboundingsphere Wethenselectaspecificmip level intheHiͲZmapatwhichthesquarewillcovernomorethanone2times2texelneighborhood This2times2neighborhood isthenfetchedfromthemapandthedepthvaluesarecomparedagainsttheprojecteddepthofapointontheboundingspherethatisnearesttothecameraThestructureoftheHiͲZmapguaranteesthatifanyofthesetexelsoccludestheobjectthenalltexelsbeneathitwillalsooccludeAlthoughwehavechosentousea2times2neighborhoodalargeronecouldbeusedinsteadandwouldprovidemoreeffectivecullingattheexpenseofaddedoverheadinthecullingtestToobtaintheclosestpointontheboundingsphereonecansimplyusethefollowingformula
ݒ ൌ ݒܥ െ ቀ ௩ȁ௩ȁቁ ݎ (8)
HereistheclosestpointincameraspaceisthespherecenterincameraspaceandisthesphereradiusTheprojecteddepthofthispointwillbeusedforthedepthcomparisonsaheadNotethatifthecamera is inside thebounding sphere this formulawill result inapointbehind thenearplanewhoseprojecteddepthisnotwelldefinedInthiscasewemustrefrainfromcullingthecharactertopreventafalseocclusionTocomputethecharacterrsquosboundingsquarewefirstcalculateitsprojectedheightinscreenspacebasedon itsdistance from the imageplane Note thatwedefine screen space as the spaceobtained afterperspectiveprojectionTheheightinscreenspaceisgivenby ൌ
ௗ ୲ୟ୬ቀഇమቁ (9)
whereisthedistancefromthespherecentertotheimageplaneandȣistheverticalfieldofviewofthecameraThewidthoftheboundingsquareisequaltothisheightdividedbytheaspectratioofthebackbufferThesizeofthesquareinscreenspaceisequaltotwiceitssizeinNDCspace(whichisanormalizedspacestartingatthetopͲleftcornerofthescreen)NotealsothatfornonͲsquareresolutionsasquareonthescreenisactuallyarectangleinscreenspaceandNDCspaceWhensamplingtheHiͲZmapwewouldliketofetchfromthelowestlevelinwhichtheboundingsquarecoversatmostfourtexels ThiswillallowustouseafixednumberoffetchestorejectanysquarenomatteritssizeTochoosethelevelweneedonlyensurethatthesizeofthesquareissmallerthanthesizeofasingletexelatthechosenresolutionInotherwordswechoosethelowestlevelisuchthat
ቀௐଶቁ ͳ (10)
wherethewidthoftherectangleinpixelsisThisyieldsthefollowingequationfori
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
70|P a g e
ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD
Chapter 3 March of the Froblins
71|P a g e
float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o
Listing 4 Vertex shader for per-instance occlusion culling
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
72|P a g e
335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands
RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )
Listing 5 Summary of our character management system
C
3 TcsftctmOdttbaipo
FccoadTmaofobt
Chapter 3 Ma
34 Cha
The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea
Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation
Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu
The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon
arch of the Fr
racterAn
nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa
canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in
xample of difgic shader
e and desired(b) User pl
we see the cushrooms (d)
of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence
oblins
nimation
h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto
ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp
ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe
mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes
ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU
redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th
ons that our determines tom the left ious poison ng away fro
eaceful restin
is illustratewherethehober isusedtouse the teeanimationweight annotableperfo
med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour
ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor
Froblins cathe current a(a) Froblin cloud in the
om the hazarng restores t
ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai
dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes
kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res
an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo
e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto
s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr
miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi
These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri
ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic
7
ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou
c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p
ns are controsed on each ed treasure tas a result bout to munts
ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo
73|P a g e
mationsandonbyvertexkthebonesfcharacters
merousdrawnd33)theursystemby
fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and
olled by the characterrsquos to the drop-they scatter
nch on some
ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
74|P a g e
Figure 8 Animation texture layout
Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss
float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering
Bone 1
Matrix Row 0
Matrix Row 1
Matrix Row 2
Bone 0
Matrix Row 0
Matrix Row 1
Matrix Row 2
Key 0 Key1 Key N
RGBA FP16
Chapter 3 March of the Froblins
75|P a g e
void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)
Listing 6 Shader code to fetch interpolate and blend bone animations
AN
7
3
F(st
Rsat(stfs
T
AdvancesinReNTatarchuk(E
76|P a g e
35 Tess
Figure 9 Ou(left) On theshaders and the tessellate
Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder
FroblLowr
Zbrushreghighr
Table 1 Com
ealͲTimeRendeEditor)
sellation
ur system alle right the textures W
ed character
erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof
lincontrolcagesolutionmod
resolutionFro
mparison of
eringin3DGra
andCrow
lows rendersame chara
While using thr on the left
GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste
edel
blinmodel
memory foo
aphicsandGam
wdRende
ing characteacter is rendhe same memwhereas the
cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo
tprint for hig
mesCoursendashS
ering
ers with extrdered withomory footprie low resolut
asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke
Polygons
5160faces
15+Mfaces
gh and low r
IGGRAPH2008
reme details ut the use oint we are ation characte
60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka
resolution m
8
in close-up of tessellatioable to add er has much
RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf
Vertexa
2Ktimes2K16b
Vert
Ind
models for ou
when using on using idehigh level ofcoarser silh
D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption
TotalMemory
ndindexbuffe
itdisplacemen
texbuffer~27
dexBuffer180
ur main char
tessellationentical pixel f details for
houettes
0and4000fied shaderdhardwareirect3Dreg11universally
ssellationasseauthoredormanceA
y
ers100KB
ntmap10MB
70MB
0MB
racter
C
Fc
Chapter 3 Ma
Figure 10 Ccharacter
arch of the Fr
Comparison
oblins
of low resolution modeel (top) and hhigh resoluttion model (
7
(bottom) for
77|P a g e
the Froblin
AN
7
Hvtcodwotddc[
3
Ihho
F(vd
Gt1g
AdvancesinReNTatarchuk(E
78|P a g e
Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08
351 GPU
n this sectiohardwareashardware fixoutlinedinF
Figure 11 A(also referrevertices thusdisplacemen
Goingthrougtakesaninp15X times fgenerationo
CoarseS
ealͲTimeRendeEditor)
essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]
UTessellati
onwewillpsusedinouxedͲfunctionigure11
An overviewed to as the s amplifyingt obtaining
ghtheproceutprimitiveor Xboxreg 3ofgraphicsc
SuperͲPrimMesh
eringin3DGra
provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional
ionPipelin
provideanorsystemWn tessellator
w of the tesseldquocontrol cagg the input mthe final tess
essforasing(whichwe60 or ATI Rcards)Aver
Tessella
aphicsandGam
veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of
ne
overviewofWedesigned
unitavailab
ellation procgerdquo or ldquothe smesh The vesellated and
gleinputpolrefertoasaRadeonreg HDrtexshader
ator
mesCoursendashS
nefits speciis effective
ologyparamowamountorenderingooverallamodisplaceme
owsustoreddvideomemhe memoryrce Table 1f the benef
GPU tesselanAPIforableonrecen
cess We stasuper-primitertex shader
d displaced h
ygonwehaasuperͲprimD 2000Ͳ4000(whichwer
Tessellated Mesh
IGGRAPH2008
fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are
1demonstrafits provided
lationpipeliaGPUtesselntconsumer
art by rendertive meshrdquo)r is used to high resolutio
avethefollomitive)anda0 GPU generefertoasa
h
Di
8
al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell
ineavailablelationpipelirGPUsThe
ring a coars The tessellaevaluate suron mesh see
wingthehaamplifiesiterations ornevaluation
Evaluatesurfacepositions
Addsplacement
active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw
ducingtheoy relevant fy savings foation can b
eoncurreninetakingade tessellation
se low resoator unit genrface position on the righ
ardwaretess(upto411t64X for Di
nshader)is
TessellatedanMes
ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor
widthThisisverallgamefor consoleorourmainbe found in
tconsumerdvantageofnprocess is
lution mesh nerates new ons and add ht
sellatorunittrianglesorrect3Dreg 11invokedfor
dDisplacedsh
d
Chapter 3 March of the Froblins
79|P a g e
eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
80|P a g e
struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS
C
L GTHsc
Fdb Hnawddeae
Chapter 3 Ma
f f f f o o o o o r
Listing 7 Ex
GiventhatwTraditionallyHoweverthspacenormachangingthe
Figure 12 Ddisplaced in be shaded us
Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese
arch of the Fr
Convert po translatin somewhat f
float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor
ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS
return o
xample simp
wearerendey animatedereexistsaalmapsInteactualdisp
Displacementhe directio
sing normal
e intend toordertocommely thatwo generatentandnormantmapmustthe tangen
MDGPUMeserequireme
oblins
osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota
S = mul( mVPS = float4( v = vNormalWS = vTangentW
S = vBinormal
le evaluation
eringourchcharactersconcernwithatcasewelacednorma
nt of the veon of geometԢ
light thedismbineTSNMwearecomthe displa
almapsalsotbe generantͲspace norhMappertontsaremet
tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir
float4( vPovPositionWS S WS lWS
n shader for
aracterwithare rendehregardstoeareessental(asshown
rtex modifietric normal
splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr
e from objectrotating abo
vPositionOS vNormalOS )vTangentOS )vBinormalOS
ositionWS 1 1 )
r rendering i
hdisplacemered and litousingdisplatiallygeneratninFigure12
es the norma displacem
faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour
t space to woout y we can
) + vInstanc ) )
) )
nstanced tes
entmapweusing tang
acementmatinganewt2)
al used for ment amount
angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor
orld space byn simplify th
cePosWS
ssellated cha
emustmakegentͲspaceappingwhentangentfram
rendering
t D The resu
cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa
8
y rotating anhe math
aracters
eanoteabonormal manlightingusimeasweare
P is the oriulting point
mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat
81|P a g e
nd
outlightingps (TSNM)ngtangentͲedisplacing
iginal point Prsquo needs to
heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan
t
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
82|P a g e
353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows
ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ
HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive
353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed
Chapter 3 March of the Froblins
83|P a g e
vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
70|P a g e
ൌ ۀଶሺሻڿ (11)Thisholdsprovidedthatthewidthoftherectangle inpixels is largerthantheheight Ifthis isnotthecasetheheightinpixelsshouldbeusedinsteadThiswillhappenwhenevertheaspectratioislessthanoneOncewehavechosenthecorrectmiplevelweperformatexturefetchfromtheHiͲZmapateachcornerof thebounding square compute themaximum fetched value and compare itwith thedepthof thedepthofthepointPvHLSLcodeforocclusioncullingisgiveninListing4Thevertexshadershowninthelistingisusedtogetherwithafilteringgeometryshader(Listing1)tofilteroutcharacter instanceswhichareoccludedbyotherscene elements Remember that this calculation is only performed once per character not once perrenderedvertex334LODSelectionGiventheaboveLODselection isalsosimpleto implement WeuseadiscreteLODscheme inwhichadifferentlevelofdetailisselectedbasedonthedistancefromthecameratothecharacterrsquoscenterThisis implementedbyusing three successive filteringpasses to separate thecharacters into threedisjointsetsbasedon theirdistances to the camera These filteringpasses are applied to the resultsof thecullingstepssothatonlyvisiblecharactersareprocessedThecullingresultsarecomputedonceandreͲusedfortheLODselectionpasses Werenderthecharacters inthefinest(closest)LODusinghardwaretessellationanddisplacementmapping(seesection35)anduseconventionalrenderingforthemiddleLODandsimplifiedgeometryandpixelshadersforthefurthestLOD
Chapter 3 March of the Froblins
71|P a g e
float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o
Listing 4 Vertex shader for per-instance occlusion culling
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
72|P a g e
335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands
RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )
Listing 5 Summary of our character management system
C
3 TcsftctmOdttbaipo
FccoadTmaofobt
Chapter 3 Ma
34 Cha
The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea
Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation
Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu
The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon
arch of the Fr
racterAn
nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa
canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in
xample of difgic shader
e and desired(b) User pl
we see the cushrooms (d)
of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence
oblins
nimation
h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto
ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp
ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe
mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes
ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU
redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th
ons that our determines tom the left ious poison ng away fro
eaceful restin
is illustratewherethehober isusedtouse the teeanimationweight annotableperfo
med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour
ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor
Froblins cathe current a(a) Froblin cloud in the
om the hazarng restores t
ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai
dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes
kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res
an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo
e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto
s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr
miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi
These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri
ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic
7
ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou
c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p
ns are controsed on each ed treasure tas a result bout to munts
ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo
73|P a g e
mationsandonbyvertexkthebonesfcharacters
merousdrawnd33)theursystemby
fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and
olled by the characterrsquos to the drop-they scatter
nch on some
ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
74|P a g e
Figure 8 Animation texture layout
Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss
float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering
Bone 1
Matrix Row 0
Matrix Row 1
Matrix Row 2
Bone 0
Matrix Row 0
Matrix Row 1
Matrix Row 2
Key 0 Key1 Key N
RGBA FP16
Chapter 3 March of the Froblins
75|P a g e
void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)
Listing 6 Shader code to fetch interpolate and blend bone animations
AN
7
3
F(st
Rsat(stfs
T
AdvancesinReNTatarchuk(E
76|P a g e
35 Tess
Figure 9 Ou(left) On theshaders and the tessellate
Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder
FroblLowr
Zbrushreghighr
Table 1 Com
ealͲTimeRendeEditor)
sellation
ur system alle right the textures W
ed character
erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof
lincontrolcagesolutionmod
resolutionFro
mparison of
eringin3DGra
andCrow
lows rendersame chara
While using thr on the left
GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste
edel
blinmodel
memory foo
aphicsandGam
wdRende
ing characteacter is rendhe same memwhereas the
cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo
tprint for hig
mesCoursendashS
ering
ers with extrdered withomory footprie low resolut
asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke
Polygons
5160faces
15+Mfaces
gh and low r
IGGRAPH2008
reme details ut the use oint we are ation characte
60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka
resolution m
8
in close-up of tessellatioable to add er has much
RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf
Vertexa
2Ktimes2K16b
Vert
Ind
models for ou
when using on using idehigh level ofcoarser silh
D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption
TotalMemory
ndindexbuffe
itdisplacemen
texbuffer~27
dexBuffer180
ur main char
tessellationentical pixel f details for
houettes
0and4000fied shaderdhardwareirect3Dreg11universally
ssellationasseauthoredormanceA
y
ers100KB
ntmap10MB
70MB
0MB
racter
C
Fc
Chapter 3 Ma
Figure 10 Ccharacter
arch of the Fr
Comparison
oblins
of low resolution modeel (top) and hhigh resoluttion model (
7
(bottom) for
77|P a g e
the Froblin
AN
7
Hvtcodwotddc[
3
Ihho
F(vd
Gt1g
AdvancesinReNTatarchuk(E
78|P a g e
Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08
351 GPU
n this sectiohardwareashardware fixoutlinedinF
Figure 11 A(also referrevertices thusdisplacemen
Goingthrougtakesaninp15X times fgenerationo
CoarseS
ealͲTimeRendeEditor)
essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]
UTessellati
onwewillpsusedinouxedͲfunctionigure11
An overviewed to as the s amplifyingt obtaining
ghtheproceutprimitiveor Xboxreg 3ofgraphicsc
SuperͲPrimMesh
eringin3DGra
provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional
ionPipelin
provideanorsystemWn tessellator
w of the tesseldquocontrol cagg the input mthe final tess
essforasing(whichwe60 or ATI Rcards)Aver
Tessella
aphicsandGam
veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of
ne
overviewofWedesigned
unitavailab
ellation procgerdquo or ldquothe smesh The vesellated and
gleinputpolrefertoasaRadeonreg HDrtexshader
ator
mesCoursendashS
nefits speciis effective
ologyparamowamountorenderingooverallamodisplaceme
owsustoreddvideomemhe memoryrce Table 1f the benef
GPU tesselanAPIforableonrecen
cess We stasuper-primitertex shader
d displaced h
ygonwehaasuperͲprimD 2000Ͳ4000(whichwer
Tessellated Mesh
IGGRAPH2008
fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are
1demonstrafits provided
lationpipeliaGPUtesselntconsumer
art by rendertive meshrdquo)r is used to high resolutio
avethefollomitive)anda0 GPU generefertoasa
h
Di
8
al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell
ineavailablelationpipelirGPUsThe
ring a coars The tessellaevaluate suron mesh see
wingthehaamplifiesiterations ornevaluation
Evaluatesurfacepositions
Addsplacement
active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw
ducingtheoy relevant fy savings foation can b
eoncurreninetakingade tessellation
se low resoator unit genrface position on the righ
ardwaretess(upto411t64X for Di
nshader)is
TessellatedanMes
ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor
widthThisisverallgamefor consoleorourmainbe found in
tconsumerdvantageofnprocess is
lution mesh nerates new ons and add ht
sellatorunittrianglesorrect3Dreg 11invokedfor
dDisplacedsh
d
Chapter 3 March of the Froblins
79|P a g e
eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
80|P a g e
struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS
C
L GTHsc
Fdb Hnawddeae
Chapter 3 Ma
f f f f o o o o o r
Listing 7 Ex
GiventhatwTraditionallyHoweverthspacenormachangingthe
Figure 12 Ddisplaced in be shaded us
Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese
arch of the Fr
Convert po translatin somewhat f
float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor
ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS
return o
xample simp
wearerendey animatedereexistsaalmapsInteactualdisp
Displacementhe directio
sing normal
e intend toordertocommely thatwo generatentandnormantmapmustthe tangen
MDGPUMeserequireme
oblins
osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota
S = mul( mVPS = float4( v = vNormalWS = vTangentW
S = vBinormal
le evaluation
eringourchcharactersconcernwithatcasewelacednorma
nt of the veon of geometԢ
light thedismbineTSNMwearecomthe displa
almapsalsotbe generantͲspace norhMappertontsaremet
tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir
float4( vPovPositionWS S WS lWS
n shader for
aracterwithare rendehregardstoeareessental(asshown
rtex modifietric normal
splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr
e from objectrotating abo
vPositionOS vNormalOS )vTangentOS )vBinormalOS
ositionWS 1 1 )
r rendering i
hdisplacemered and litousingdisplatiallygeneratninFigure12
es the norma displacem
faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour
t space to woout y we can
) + vInstanc ) )
) )
nstanced tes
entmapweusing tang
acementmatinganewt2)
al used for ment amount
angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor
orld space byn simplify th
cePosWS
ssellated cha
emustmakegentͲspaceappingwhentangentfram
rendering
t D The resu
cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa
8
y rotating anhe math
aracters
eanoteabonormal manlightingusimeasweare
P is the oriulting point
mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat
81|P a g e
nd
outlightingps (TSNM)ngtangentͲedisplacing
iginal point Prsquo needs to
heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan
t
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
82|P a g e
353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows
ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ
HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive
353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed
Chapter 3 March of the Froblins
83|P a g e
vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
71|P a g e
float4x4 mV Viewing transform float4x4 mP Projection transform float3 vCameraPosition Camera location in world space float fCameraFOV Cameras vertical field of view angle float fCameraAspect Camera aspect ratio float4 vSphere Bounding sphere center (XYZ) and radius (W) object space float4 vViewport XYWidthHeight Texture2Dltfloatgt tHiZMap sampler sHiZPoint struct VSInput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime struct VSOutput float4 vPositionAndGroup PositionAndGroup float4 vDirection DirectionStateAndTime float fVisible IsVisible VSOutput VS( VSInput i ) VSOutput o ovPositionAndGroup = ivPositionAndGroup ovDirectionxyzw = ivDirectionxyzw ofVisible = 1 compute bounding sphere center in camera space float3 vAgentCenterWS = vSpherexyz + ivPositionAndGroupxyz float3 Cv = mul( mV float4( vAgentCenterWSxyz 1 ) )xyz Do not cull agents if the camera is inside their bounding sphere if( length( Cv ) gt vSpherew ) compute nearest point to camera on sphere and project it float3 Pv = Cv - normalize( Cv ) vSpherew float4 vPositionSS = mul( mP float4(Pv1) ) compute radii of bounding rectangle in screen space (2x the radii in NDC) float fRadiusY = vSpherew ( Cvz tan( fCameraFOV 2 ) ) float fRadiusX = fRadiusY fCameraAspect compute UVs for corners of projected bounding square float2 vCornerNDC = vPositionSSxy vPositionSSw vCornerNDC = float2(05-05) vCornerNDC + float2( 05 05 ) vCornerNDC -= 05 float2( fRadiusX fRadiusY ) float2 vCorner0 = vCornerNDC float2 vCorner1 = vCornerNDC + float2( fRadiusX 0 ) float2 vCorner2 = vCornerNDC + float2( 0 fRadiusY ) float2 vCorner3 = vCornerNDC + float2( fRadiusX fRadiusY ) Choose a MIP level in the HiZ map (assume that width gt height) float W = fRadiusX vViewportz float fLOD = ceil( log2( W ) ) fetch depth samples at the corners of the square to compare against float4 vSamples vSamplesx = tHiZMapSampleLevel( sHiZPoint vCorner0 fLOD ) vSamplesy = tHiZMapSampleLevel( sHiZPoint vCorner1 fLOD ) vSamplesz = tHiZMapSampleLevel( sHiZPoint vCorner2 fLOD ) vSamplesw = tHiZMapSampleLevel( sHiZPoint vCorner3 fLOD ) float fMaxDepth = max( max( vSamplesx vSamplesy ) max( vSamplesz vSamplesw ) ) cull agent if the agent depth is greater than the largest of our ZMap values ofVisible = ( (vPositionSSz vPositionSSw) gt fMaxDepth ) 0 1 return o
Listing 4 Vertex shader for per-instance occlusion culling
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
72|P a g e
335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands
RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )
Listing 5 Summary of our character management system
C
3 TcsftctmOdttbaipo
FccoadTmaofobt
Chapter 3 Ma
34 Cha
The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea
Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation
Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu
The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon
arch of the Fr
racterAn
nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa
canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in
xample of difgic shader
e and desired(b) User pl
we see the cushrooms (d)
of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence
oblins
nimation
h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto
ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp
ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe
mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes
ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU
redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th
ons that our determines tom the left ious poison ng away fro
eaceful restin
is illustratewherethehober isusedtouse the teeanimationweight annotableperfo
med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour
ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor
Froblins cathe current a(a) Froblin cloud in the
om the hazarng restores t
ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai
dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes
kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res
an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo
e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto
s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr
miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi
These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri
ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic
7
ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou
c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p
ns are controsed on each ed treasure tas a result bout to munts
ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo
73|P a g e
mationsandonbyvertexkthebonesfcharacters
merousdrawnd33)theursystemby
fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and
olled by the characterrsquos to the drop-they scatter
nch on some
ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
74|P a g e
Figure 8 Animation texture layout
Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss
float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering
Bone 1
Matrix Row 0
Matrix Row 1
Matrix Row 2
Bone 0
Matrix Row 0
Matrix Row 1
Matrix Row 2
Key 0 Key1 Key N
RGBA FP16
Chapter 3 March of the Froblins
75|P a g e
void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)
Listing 6 Shader code to fetch interpolate and blend bone animations
AN
7
3
F(st
Rsat(stfs
T
AdvancesinReNTatarchuk(E
76|P a g e
35 Tess
Figure 9 Ou(left) On theshaders and the tessellate
Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder
FroblLowr
Zbrushreghighr
Table 1 Com
ealͲTimeRendeEditor)
sellation
ur system alle right the textures W
ed character
erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof
lincontrolcagesolutionmod
resolutionFro
mparison of
eringin3DGra
andCrow
lows rendersame chara
While using thr on the left
GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste
edel
blinmodel
memory foo
aphicsandGam
wdRende
ing characteacter is rendhe same memwhereas the
cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo
tprint for hig
mesCoursendashS
ering
ers with extrdered withomory footprie low resolut
asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke
Polygons
5160faces
15+Mfaces
gh and low r
IGGRAPH2008
reme details ut the use oint we are ation characte
60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka
resolution m
8
in close-up of tessellatioable to add er has much
RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf
Vertexa
2Ktimes2K16b
Vert
Ind
models for ou
when using on using idehigh level ofcoarser silh
D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption
TotalMemory
ndindexbuffe
itdisplacemen
texbuffer~27
dexBuffer180
ur main char
tessellationentical pixel f details for
houettes
0and4000fied shaderdhardwareirect3Dreg11universally
ssellationasseauthoredormanceA
y
ers100KB
ntmap10MB
70MB
0MB
racter
C
Fc
Chapter 3 Ma
Figure 10 Ccharacter
arch of the Fr
Comparison
oblins
of low resolution modeel (top) and hhigh resoluttion model (
7
(bottom) for
77|P a g e
the Froblin
AN
7
Hvtcodwotddc[
3
Ihho
F(vd
Gt1g
AdvancesinReNTatarchuk(E
78|P a g e
Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08
351 GPU
n this sectiohardwareashardware fixoutlinedinF
Figure 11 A(also referrevertices thusdisplacemen
Goingthrougtakesaninp15X times fgenerationo
CoarseS
ealͲTimeRendeEditor)
essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]
UTessellati
onwewillpsusedinouxedͲfunctionigure11
An overviewed to as the s amplifyingt obtaining
ghtheproceutprimitiveor Xboxreg 3ofgraphicsc
SuperͲPrimMesh
eringin3DGra
provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional
ionPipelin
provideanorsystemWn tessellator
w of the tesseldquocontrol cagg the input mthe final tess
essforasing(whichwe60 or ATI Rcards)Aver
Tessella
aphicsandGam
veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of
ne
overviewofWedesigned
unitavailab
ellation procgerdquo or ldquothe smesh The vesellated and
gleinputpolrefertoasaRadeonreg HDrtexshader
ator
mesCoursendashS
nefits speciis effective
ologyparamowamountorenderingooverallamodisplaceme
owsustoreddvideomemhe memoryrce Table 1f the benef
GPU tesselanAPIforableonrecen
cess We stasuper-primitertex shader
d displaced h
ygonwehaasuperͲprimD 2000Ͳ4000(whichwer
Tessellated Mesh
IGGRAPH2008
fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are
1demonstrafits provided
lationpipeliaGPUtesselntconsumer
art by rendertive meshrdquo)r is used to high resolutio
avethefollomitive)anda0 GPU generefertoasa
h
Di
8
al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell
ineavailablelationpipelirGPUsThe
ring a coars The tessellaevaluate suron mesh see
wingthehaamplifiesiterations ornevaluation
Evaluatesurfacepositions
Addsplacement
active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw
ducingtheoy relevant fy savings foation can b
eoncurreninetakingade tessellation
se low resoator unit genrface position on the righ
ardwaretess(upto411t64X for Di
nshader)is
TessellatedanMes
ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor
widthThisisverallgamefor consoleorourmainbe found in
tconsumerdvantageofnprocess is
lution mesh nerates new ons and add ht
sellatorunittrianglesorrect3Dreg 11invokedfor
dDisplacedsh
d
Chapter 3 March of the Froblins
79|P a g e
eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
80|P a g e
struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS
C
L GTHsc
Fdb Hnawddeae
Chapter 3 Ma
f f f f o o o o o r
Listing 7 Ex
GiventhatwTraditionallyHoweverthspacenormachangingthe
Figure 12 Ddisplaced in be shaded us
Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese
arch of the Fr
Convert po translatin somewhat f
float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor
ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS
return o
xample simp
wearerendey animatedereexistsaalmapsInteactualdisp
Displacementhe directio
sing normal
e intend toordertocommely thatwo generatentandnormantmapmustthe tangen
MDGPUMeserequireme
oblins
osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota
S = mul( mVPS = float4( v = vNormalWS = vTangentW
S = vBinormal
le evaluation
eringourchcharactersconcernwithatcasewelacednorma
nt of the veon of geometԢ
light thedismbineTSNMwearecomthe displa
almapsalsotbe generantͲspace norhMappertontsaremet
tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir
float4( vPovPositionWS S WS lWS
n shader for
aracterwithare rendehregardstoeareessental(asshown
rtex modifietric normal
splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr
e from objectrotating abo
vPositionOS vNormalOS )vTangentOS )vBinormalOS
ositionWS 1 1 )
r rendering i
hdisplacemered and litousingdisplatiallygeneratninFigure12
es the norma displacem
faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour
t space to woout y we can
) + vInstanc ) )
) )
nstanced tes
entmapweusing tang
acementmatinganewt2)
al used for ment amount
angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor
orld space byn simplify th
cePosWS
ssellated cha
emustmakegentͲspaceappingwhentangentfram
rendering
t D The resu
cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa
8
y rotating anhe math
aracters
eanoteabonormal manlightingusimeasweare
P is the oriulting point
mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat
81|P a g e
nd
outlightingps (TSNM)ngtangentͲedisplacing
iginal point Prsquo needs to
heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan
t
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
82|P a g e
353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows
ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ
HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive
353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed
Chapter 3 March of the Froblins
83|P a g e
vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
72|P a g e
335CharacterManagementSystemHighLevelOverviewUsingtheaboveconceptswecannowdescribeourGPUcharactermanagementsystem in itsentiretyillustratedinpseudoͲcodeinListing5WebeginbyrenderingtheoccludinggeometryandpreparingtheHiͲZmap We thenrunallcharacters through theview frustumculling filterandstreamout theoneswhichpassTheresultsoftheviewͲfrustumpassarethenrunthroughtheocclusioncullingfilterusingaDrawAutocall The instanceswhichpasstheocclusioncullingtestarethenrunthroughaseriesofLODselectionfilterstoseparatethembyLODOncewersquovedeterminedthevisiblecharacters ineachLODwewould liketorenderallofthecharacterinstances ineachgivenLODInorderto issuethedrawcallforthatLODweneedtoknowthe instancecountObtainingthisinstancecountunfortunatelyrequirestheuseofastreamoutstatisticsqueryLikeocclusion queries stream out statistics queries can cause significant stalls and thus performancedegradationwhentheresultsareusedinthesameframethatthequeryisissuedbecausetheGPUmaygo idlewhiletheapplication isprocessingthequeryresultsHoweveraneasysolutionforthis istoreͲorderdrawͲcallstofillthegapbetweenpreviouscomputationsandtheresultofthequeryInoursystemwe are able to offset the GPU stall by interleaving scenemanagementwith the next framersquos crowdmovementsimulationThisensuresthattheGPUiskeptbusywhiletheCPUisreadingthequeryresultandissuingthecharacterrenderingcommands
RenderOccluders() RenderHiZMap() Do view-frustum culling streaming visible instances to lsquofrustumCullOutputrsquo buffer BindFrustumShader() IASetVertexBuffers( characterVB ) SOSetTargets( frustumCullOutput ) Draw( POINT_LIST CHARACTER_COUNT ) Do occlusion culling on frustum culling results streaming visible instances to lsquoocclusionCullOutputrsquo buffer BindOcclusionShader() IASetVertexBuffers( frustumCullOutput ) SOSetTargets( occulsionCullOutput ) DrawAuto( POINT_LIST ) render output of frustum culling shader Filter occlusion culling results by LOD and issue queries to read the final counts IASetVertexBuffers( occlusionCullOutput ) for(int i=0 iltLOD_COUNT i++ ) BindLODShader( LOD[i]minDistance LOD[i]maxDistance ) SOSetTargets( LOD[i]instanceDataBuffer ) LOD[i]query-gtBegin() DrawAuto( POINT_LIST ) render output of occlusion culling shader LOD[i]query-gtEnd() if possible do other CPU and GPU work here to fill out the query stall read back character counts and render characters in each LOD for( int i=0 iltLOD_COUNT i++ ) int instanceCount = LOD[i]query-gtGetPrimitiveCount() DrawInstancedCharacter( LOD[i] instanceCount )
Listing 5 Summary of our character management system
C
3 TcsftctmOdttbaipo
FccoadTmaofobt
Chapter 3 Ma
34 Cha
The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea
Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation
Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu
The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon
arch of the Fr
racterAn
nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa
canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in
xample of difgic shader
e and desired(b) User pl
we see the cushrooms (d)
of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence
oblins
nimation
h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto
ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp
ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe
mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes
ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU
redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th
ons that our determines tom the left ious poison ng away fro
eaceful restin
is illustratewherethehober isusedtouse the teeanimationweight annotableperfo
med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour
ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor
Froblins cathe current a(a) Froblin cloud in the
om the hazarng restores t
ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai
dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes
kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res
an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo
e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto
s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr
miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi
These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri
ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic
7
ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou
c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p
ns are controsed on each ed treasure tas a result bout to munts
ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo
73|P a g e
mationsandonbyvertexkthebonesfcharacters
merousdrawnd33)theursystemby
fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and
olled by the characterrsquos to the drop-they scatter
nch on some
ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
74|P a g e
Figure 8 Animation texture layout
Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss
float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering
Bone 1
Matrix Row 0
Matrix Row 1
Matrix Row 2
Bone 0
Matrix Row 0
Matrix Row 1
Matrix Row 2
Key 0 Key1 Key N
RGBA FP16
Chapter 3 March of the Froblins
75|P a g e
void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)
Listing 6 Shader code to fetch interpolate and blend bone animations
AN
7
3
F(st
Rsat(stfs
T
AdvancesinReNTatarchuk(E
76|P a g e
35 Tess
Figure 9 Ou(left) On theshaders and the tessellate
Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder
FroblLowr
Zbrushreghighr
Table 1 Com
ealͲTimeRendeEditor)
sellation
ur system alle right the textures W
ed character
erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof
lincontrolcagesolutionmod
resolutionFro
mparison of
eringin3DGra
andCrow
lows rendersame chara
While using thr on the left
GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste
edel
blinmodel
memory foo
aphicsandGam
wdRende
ing characteacter is rendhe same memwhereas the
cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo
tprint for hig
mesCoursendashS
ering
ers with extrdered withomory footprie low resolut
asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke
Polygons
5160faces
15+Mfaces
gh and low r
IGGRAPH2008
reme details ut the use oint we are ation characte
60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka
resolution m
8
in close-up of tessellatioable to add er has much
RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf
Vertexa
2Ktimes2K16b
Vert
Ind
models for ou
when using on using idehigh level ofcoarser silh
D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption
TotalMemory
ndindexbuffe
itdisplacemen
texbuffer~27
dexBuffer180
ur main char
tessellationentical pixel f details for
houettes
0and4000fied shaderdhardwareirect3Dreg11universally
ssellationasseauthoredormanceA
y
ers100KB
ntmap10MB
70MB
0MB
racter
C
Fc
Chapter 3 Ma
Figure 10 Ccharacter
arch of the Fr
Comparison
oblins
of low resolution modeel (top) and hhigh resoluttion model (
7
(bottom) for
77|P a g e
the Froblin
AN
7
Hvtcodwotddc[
3
Ihho
F(vd
Gt1g
AdvancesinReNTatarchuk(E
78|P a g e
Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08
351 GPU
n this sectiohardwareashardware fixoutlinedinF
Figure 11 A(also referrevertices thusdisplacemen
Goingthrougtakesaninp15X times fgenerationo
CoarseS
ealͲTimeRendeEditor)
essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]
UTessellati
onwewillpsusedinouxedͲfunctionigure11
An overviewed to as the s amplifyingt obtaining
ghtheproceutprimitiveor Xboxreg 3ofgraphicsc
SuperͲPrimMesh
eringin3DGra
provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional
ionPipelin
provideanorsystemWn tessellator
w of the tesseldquocontrol cagg the input mthe final tess
essforasing(whichwe60 or ATI Rcards)Aver
Tessella
aphicsandGam
veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of
ne
overviewofWedesigned
unitavailab
ellation procgerdquo or ldquothe smesh The vesellated and
gleinputpolrefertoasaRadeonreg HDrtexshader
ator
mesCoursendashS
nefits speciis effective
ologyparamowamountorenderingooverallamodisplaceme
owsustoreddvideomemhe memoryrce Table 1f the benef
GPU tesselanAPIforableonrecen
cess We stasuper-primitertex shader
d displaced h
ygonwehaasuperͲprimD 2000Ͳ4000(whichwer
Tessellated Mesh
IGGRAPH2008
fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are
1demonstrafits provided
lationpipeliaGPUtesselntconsumer
art by rendertive meshrdquo)r is used to high resolutio
avethefollomitive)anda0 GPU generefertoasa
h
Di
8
al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell
ineavailablelationpipelirGPUsThe
ring a coars The tessellaevaluate suron mesh see
wingthehaamplifiesiterations ornevaluation
Evaluatesurfacepositions
Addsplacement
active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw
ducingtheoy relevant fy savings foation can b
eoncurreninetakingade tessellation
se low resoator unit genrface position on the righ
ardwaretess(upto411t64X for Di
nshader)is
TessellatedanMes
ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor
widthThisisverallgamefor consoleorourmainbe found in
tconsumerdvantageofnprocess is
lution mesh nerates new ons and add ht
sellatorunittrianglesorrect3Dreg 11invokedfor
dDisplacedsh
d
Chapter 3 March of the Froblins
79|P a g e
eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
80|P a g e
struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS
C
L GTHsc
Fdb Hnawddeae
Chapter 3 Ma
f f f f o o o o o r
Listing 7 Ex
GiventhatwTraditionallyHoweverthspacenormachangingthe
Figure 12 Ddisplaced in be shaded us
Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese
arch of the Fr
Convert po translatin somewhat f
float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor
ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS
return o
xample simp
wearerendey animatedereexistsaalmapsInteactualdisp
Displacementhe directio
sing normal
e intend toordertocommely thatwo generatentandnormantmapmustthe tangen
MDGPUMeserequireme
oblins
osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota
S = mul( mVPS = float4( v = vNormalWS = vTangentW
S = vBinormal
le evaluation
eringourchcharactersconcernwithatcasewelacednorma
nt of the veon of geometԢ
light thedismbineTSNMwearecomthe displa
almapsalsotbe generantͲspace norhMappertontsaremet
tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir
float4( vPovPositionWS S WS lWS
n shader for
aracterwithare rendehregardstoeareessental(asshown
rtex modifietric normal
splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr
e from objectrotating abo
vPositionOS vNormalOS )vTangentOS )vBinormalOS
ositionWS 1 1 )
r rendering i
hdisplacemered and litousingdisplatiallygeneratninFigure12
es the norma displacem
faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour
t space to woout y we can
) + vInstanc ) )
) )
nstanced tes
entmapweusing tang
acementmatinganewt2)
al used for ment amount
angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor
orld space byn simplify th
cePosWS
ssellated cha
emustmakegentͲspaceappingwhentangentfram
rendering
t D The resu
cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa
8
y rotating anhe math
aracters
eanoteabonormal manlightingusimeasweare
P is the oriulting point
mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat
81|P a g e
nd
outlightingps (TSNM)ngtangentͲedisplacing
iginal point Prsquo needs to
heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan
t
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
82|P a g e
353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows
ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ
HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive
353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed
Chapter 3 March of the Froblins
83|P a g e
vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
C
3 TcsftctmOdttbaipo
FccoadTmaofobt
Chapter 3 Ma
34 Cha
The traditioncomputeamshadersThiformultiplethatcanbecallsFurthetraditionalamovingthea
Ouragentscdemonstrateto 40 differetransformatibonedirectlyand a timenformationperforms itsorientation
Figure 7 Excharacter logcurrent stateoff location away Here wdelicious mu
The layout omatricesWandbone indoneaxisof tframesShaour perͲvertbonesInouthantwobon
arch of the Fr
racterAn
nalapproacmatrixpalettisisgeneralindividualsihandledusinrmoresincepproachtosanimationsa
canperformedinFigureent animatiionhierarchyintoobjectoffset withto fetch ins skinning in
xample of difgic shader
e and desired(b) User pl
we see the cushrooms (d)
of our animWeuseatextdexandthethe texturedercodetotex bone infurcasethisneinfluence
oblins
nimation
h to renderteontheCPlydoneonceintoconstanngthisappreweareusiskinningissiamplingonto
ma setofp7Eachaction sequenchyandcomptspaceDurhin that seqnterpolate an object sp
ifferent actiowhich also dd action Frolaced a noxicritter runnid) A bit of pe
mation dataturearrayweslicenumballowsus toperformthefluences byprovidesanes
ringkey framUwhichistepercharacntstoretheroachandlaingtheGPUimplynotfeotheGPU
redefinedaionhasanaces and tranputeaboneringthesimuquence Dand blend tpace and th
ons that our determines tom the left ious poison ng away fro
eaceful restin
is illustratewherethehober isusedtouse the teeanimationweight annotableperfo
med skinnethenloadedcterAlthourearestillsargecrowdsUtomanageasibleinour
ctions (walkssociatedannsitions Due transformaulationeachuring charathe key framhen transfor
Froblins cathe current a(a) Froblin cloud in the
om the hazarng restores t
ed in Figureorizontalanto indextheexture filterfetchandbnd use dynaormancegai
dcharacterintoconstaughitissomeriouslimitaofcharacteourcharactrcaseWes
kingeatingnimationseqring animatation foreachcharacteracter rendermes for eacrms the res
an perform Tanimation secarrying hise path of Frrd (c) The Fthis Froblinrsquo
e 8 The trdverticaldieanimationringhardwarblendingisgamic branchinasmosto
s is to sampntstoreformetimesposationsontheerswillstillrters(seesecsolvethispr
miningetcquenceInotion preprocchkey framisassignedaring the vech bone Eacsult accordi
These actionequence bass hard-earneroblins and Froblin is abrsquos good spiri
ransformatiomensionscosequence re to interpiveninListinhing to avoidofourvertic
7
ple theanimconsumptiosibletopacenumberofrequirenumctions32anobleminou
c) someooursystemwcessingweme that transananimationertex shadech instancedng to its p
ns are controsed on each ed treasure tas a result bout to munts
ons are stoorrespondtoVaryingtheolatebetweng6Notetd fetching zesdonotpo
73|P a g e
mationsandonbyvertexkthebonesfcharacters
merousdrawnd33)theursystemby
fwhichareweusecloseflatten thesforms thatnsequenceer uses thisd characterosition and
olled by the characterrsquos to the drop-they scatter
nch on some
ored as 3times4okeyframetimealongeen thekeythatwesortzeroͲweightossessmore
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
74|P a g e
Figure 8 Animation texture layout
Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss
float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering
Bone 1
Matrix Row 0
Matrix Row 1
Matrix Row 2
Bone 0
Matrix Row 0
Matrix Row 1
Matrix Row 2
Key 0 Key1 Key N
RGBA FP16
Chapter 3 March of the Froblins
75|P a g e
void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)
Listing 6 Shader code to fetch interpolate and blend bone animations
AN
7
3
F(st
Rsat(stfs
T
AdvancesinReNTatarchuk(E
76|P a g e
35 Tess
Figure 9 Ou(left) On theshaders and the tessellate
Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder
FroblLowr
Zbrushreghighr
Table 1 Com
ealͲTimeRendeEditor)
sellation
ur system alle right the textures W
ed character
erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof
lincontrolcagesolutionmod
resolutionFro
mparison of
eringin3DGra
andCrow
lows rendersame chara
While using thr on the left
GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste
edel
blinmodel
memory foo
aphicsandGam
wdRende
ing characteacter is rendhe same memwhereas the
cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo
tprint for hig
mesCoursendashS
ering
ers with extrdered withomory footprie low resolut
asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke
Polygons
5160faces
15+Mfaces
gh and low r
IGGRAPH2008
reme details ut the use oint we are ation characte
60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka
resolution m
8
in close-up of tessellatioable to add er has much
RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf
Vertexa
2Ktimes2K16b
Vert
Ind
models for ou
when using on using idehigh level ofcoarser silh
D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption
TotalMemory
ndindexbuffe
itdisplacemen
texbuffer~27
dexBuffer180
ur main char
tessellationentical pixel f details for
houettes
0and4000fied shaderdhardwareirect3Dreg11universally
ssellationasseauthoredormanceA
y
ers100KB
ntmap10MB
70MB
0MB
racter
C
Fc
Chapter 3 Ma
Figure 10 Ccharacter
arch of the Fr
Comparison
oblins
of low resolution modeel (top) and hhigh resoluttion model (
7
(bottom) for
77|P a g e
the Froblin
AN
7
Hvtcodwotddc[
3
Ihho
F(vd
Gt1g
AdvancesinReNTatarchuk(E
78|P a g e
Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08
351 GPU
n this sectiohardwareashardware fixoutlinedinF
Figure 11 A(also referrevertices thusdisplacemen
Goingthrougtakesaninp15X times fgenerationo
CoarseS
ealͲTimeRendeEditor)
essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]
UTessellati
onwewillpsusedinouxedͲfunctionigure11
An overviewed to as the s amplifyingt obtaining
ghtheproceutprimitiveor Xboxreg 3ofgraphicsc
SuperͲPrimMesh
eringin3DGra
provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional
ionPipelin
provideanorsystemWn tessellator
w of the tesseldquocontrol cagg the input mthe final tess
essforasing(whichwe60 or ATI Rcards)Aver
Tessella
aphicsandGam
veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of
ne
overviewofWedesigned
unitavailab
ellation procgerdquo or ldquothe smesh The vesellated and
gleinputpolrefertoasaRadeonreg HDrtexshader
ator
mesCoursendashS
nefits speciis effective
ologyparamowamountorenderingooverallamodisplaceme
owsustoreddvideomemhe memoryrce Table 1f the benef
GPU tesselanAPIforableonrecen
cess We stasuper-primitertex shader
d displaced h
ygonwehaasuperͲprimD 2000Ͳ4000(whichwer
Tessellated Mesh
IGGRAPH2008
fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are
1demonstrafits provided
lationpipeliaGPUtesselntconsumer
art by rendertive meshrdquo)r is used to high resolutio
avethefollomitive)anda0 GPU generefertoasa
h
Di
8
al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell
ineavailablelationpipelirGPUsThe
ring a coars The tessellaevaluate suron mesh see
wingthehaamplifiesiterations ornevaluation
Evaluatesurfacepositions
Addsplacement
active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw
ducingtheoy relevant fy savings foation can b
eoncurreninetakingade tessellation
se low resoator unit genrface position on the righ
ardwaretess(upto411t64X for Di
nshader)is
TessellatedanMes
ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor
widthThisisverallgamefor consoleorourmainbe found in
tconsumerdvantageofnprocess is
lution mesh nerates new ons and add ht
sellatorunittrianglesorrect3Dreg 11invokedfor
dDisplacedsh
d
Chapter 3 March of the Froblins
79|P a g e
eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
80|P a g e
struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS
C
L GTHsc
Fdb Hnawddeae
Chapter 3 Ma
f f f f o o o o o r
Listing 7 Ex
GiventhatwTraditionallyHoweverthspacenormachangingthe
Figure 12 Ddisplaced in be shaded us
Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese
arch of the Fr
Convert po translatin somewhat f
float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor
ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS
return o
xample simp
wearerendey animatedereexistsaalmapsInteactualdisp
Displacementhe directio
sing normal
e intend toordertocommely thatwo generatentandnormantmapmustthe tangen
MDGPUMeserequireme
oblins
osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota
S = mul( mVPS = float4( v = vNormalWS = vTangentW
S = vBinormal
le evaluation
eringourchcharactersconcernwithatcasewelacednorma
nt of the veon of geometԢ
light thedismbineTSNMwearecomthe displa
almapsalsotbe generantͲspace norhMappertontsaremet
tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir
float4( vPovPositionWS S WS lWS
n shader for
aracterwithare rendehregardstoeareessental(asshown
rtex modifietric normal
splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr
e from objectrotating abo
vPositionOS vNormalOS )vTangentOS )vBinormalOS
ositionWS 1 1 )
r rendering i
hdisplacemered and litousingdisplatiallygeneratninFigure12
es the norma displacem
faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour
t space to woout y we can
) + vInstanc ) )
) )
nstanced tes
entmapweusing tang
acementmatinganewt2)
al used for ment amount
angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor
orld space byn simplify th
cePosWS
ssellated cha
emustmakegentͲspaceappingwhentangentfram
rendering
t D The resu
cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa
8
y rotating anhe math
aracters
eanoteabonormal manlightingusimeasweare
P is the oriulting point
mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat
81|P a g e
nd
outlightingps (TSNM)ngtangentͲedisplacing
iginal point Prsquo needs to
heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan
t
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
82|P a g e
353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows
ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ
HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive
353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed
Chapter 3 March of the Froblins
83|P a g e
vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
74|P a g e
Figure 8 Animation texture layout
Inourcasetheamountofmemoryconsumedbya fullsetofcharacteranimationdata isabout8MBwhichisareasonablesizeforourparticularapplicationUnfortunatelyalargefractionofthisiswastedspacewhich is incurredbecause thewidthof the texturemustbe largeenough toaccommodate thelongestanimationsequence Forfuturedirectionswewould liketo investigatetheuseofGPUͲfriendlysparsetexturestostoretheanimationdata Inourcasewe findthatmostoftheanimationsare fairlyshortwithonlyafewlongoutliersThiswastecouldbesignificantlyreducedbysimplypackingmultipleshortanimationsequences intoonepageofthetexturearrayandaddinga lookuptabletotheshaderwhichstoresthestartlocationforeachsequenceAnothersimplersolutionmightbetocontinueusingone sequence per page but to separate short and long sequences into separate arrays We did notpursueeitherofthesesolutions inour implementationbecausetheywouldhave introducedadditionalcomplexitytotheshadersandmemoryconsumptionwasnotenoughofan issueto justifythepossibleperformanceloss
float fTexWidth float fTexHeight float fCycleLengths[MAX_SLICE_COUNT] Texture2DArrayltfloat4gt tBones sampler sBones should use CLAMP addressing and linear filtering
Bone 1
Matrix Row 0
Matrix Row 1
Matrix Row 2
Bone 0
Matrix Row 0
Matrix Row 1
Matrix Row 2
Key 0 Key1 Key N
RGBA FP16
Chapter 3 March of the Froblins
75|P a g e
void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)
Listing 6 Shader code to fetch interpolate and blend bone animations
AN
7
3
F(st
Rsat(stfs
T
AdvancesinReNTatarchuk(E
76|P a g e
35 Tess
Figure 9 Ou(left) On theshaders and the tessellate
Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder
FroblLowr
Zbrushreghighr
Table 1 Com
ealͲTimeRendeEditor)
sellation
ur system alle right the textures W
ed character
erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof
lincontrolcagesolutionmod
resolutionFro
mparison of
eringin3DGra
andCrow
lows rendersame chara
While using thr on the left
GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste
edel
blinmodel
memory foo
aphicsandGam
wdRende
ing characteacter is rendhe same memwhereas the
cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo
tprint for hig
mesCoursendashS
ering
ers with extrdered withomory footprie low resolut
asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke
Polygons
5160faces
15+Mfaces
gh and low r
IGGRAPH2008
reme details ut the use oint we are ation characte
60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka
resolution m
8
in close-up of tessellatioable to add er has much
RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf
Vertexa
2Ktimes2K16b
Vert
Ind
models for ou
when using on using idehigh level ofcoarser silh
D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption
TotalMemory
ndindexbuffe
itdisplacemen
texbuffer~27
dexBuffer180
ur main char
tessellationentical pixel f details for
houettes
0and4000fied shaderdhardwareirect3Dreg11universally
ssellationasseauthoredormanceA
y
ers100KB
ntmap10MB
70MB
0MB
racter
C
Fc
Chapter 3 Ma
Figure 10 Ccharacter
arch of the Fr
Comparison
oblins
of low resolution modeel (top) and hhigh resoluttion model (
7
(bottom) for
77|P a g e
the Froblin
AN
7
Hvtcodwotddc[
3
Ihho
F(vd
Gt1g
AdvancesinReNTatarchuk(E
78|P a g e
Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08
351 GPU
n this sectiohardwareashardware fixoutlinedinF
Figure 11 A(also referrevertices thusdisplacemen
Goingthrougtakesaninp15X times fgenerationo
CoarseS
ealͲTimeRendeEditor)
essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]
UTessellati
onwewillpsusedinouxedͲfunctionigure11
An overviewed to as the s amplifyingt obtaining
ghtheproceutprimitiveor Xboxreg 3ofgraphicsc
SuperͲPrimMesh
eringin3DGra
provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional
ionPipelin
provideanorsystemWn tessellator
w of the tesseldquocontrol cagg the input mthe final tess
essforasing(whichwe60 or ATI Rcards)Aver
Tessella
aphicsandGam
veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of
ne
overviewofWedesigned
unitavailab
ellation procgerdquo or ldquothe smesh The vesellated and
gleinputpolrefertoasaRadeonreg HDrtexshader
ator
mesCoursendashS
nefits speciis effective
ologyparamowamountorenderingooverallamodisplaceme
owsustoreddvideomemhe memoryrce Table 1f the benef
GPU tesselanAPIforableonrecen
cess We stasuper-primitertex shader
d displaced h
ygonwehaasuperͲprimD 2000Ͳ4000(whichwer
Tessellated Mesh
IGGRAPH2008
fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are
1demonstrafits provided
lationpipeliaGPUtesselntconsumer
art by rendertive meshrdquo)r is used to high resolutio
avethefollomitive)anda0 GPU generefertoasa
h
Di
8
al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell
ineavailablelationpipelirGPUsThe
ring a coars The tessellaevaluate suron mesh see
wingthehaamplifiesiterations ornevaluation
Evaluatesurfacepositions
Addsplacement
active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw
ducingtheoy relevant fy savings foation can b
eoncurreninetakingade tessellation
se low resoator unit genrface position on the righ
ardwaretess(upto411t64X for Di
nshader)is
TessellatedanMes
ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor
widthThisisverallgamefor consoleorourmainbe found in
tconsumerdvantageofnprocess is
lution mesh nerates new ons and add ht
sellatorunittrianglesorrect3Dreg 11invokedfor
dDisplacedsh
d
Chapter 3 March of the Froblins
79|P a g e
eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
80|P a g e
struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS
C
L GTHsc
Fdb Hnawddeae
Chapter 3 Ma
f f f f o o o o o r
Listing 7 Ex
GiventhatwTraditionallyHoweverthspacenormachangingthe
Figure 12 Ddisplaced in be shaded us
Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese
arch of the Fr
Convert po translatin somewhat f
float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor
ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS
return o
xample simp
wearerendey animatedereexistsaalmapsInteactualdisp
Displacementhe directio
sing normal
e intend toordertocommely thatwo generatentandnormantmapmustthe tangen
MDGPUMeserequireme
oblins
osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota
S = mul( mVPS = float4( v = vNormalWS = vTangentW
S = vBinormal
le evaluation
eringourchcharactersconcernwithatcasewelacednorma
nt of the veon of geometԢ
light thedismbineTSNMwearecomthe displa
almapsalsotbe generantͲspace norhMappertontsaremet
tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir
float4( vPovPositionWS S WS lWS
n shader for
aracterwithare rendehregardstoeareessental(asshown
rtex modifietric normal
splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr
e from objectrotating abo
vPositionOS vNormalOS )vTangentOS )vBinormalOS
ositionWS 1 1 )
r rendering i
hdisplacemered and litousingdisplatiallygeneratninFigure12
es the norma displacem
faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour
t space to woout y we can
) + vInstanc ) )
) )
nstanced tes
entmapweusing tang
acementmatinganewt2)
al used for ment amount
angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor
orld space byn simplify th
cePosWS
ssellated cha
emustmakegentͲspaceappingwhentangentfram
rendering
t D The resu
cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa
8
y rotating anhe math
aracters
eanoteabonormal manlightingusimeasweare
P is the oriulting point
mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat
81|P a g e
nd
outlightingps (TSNM)ngtangentͲedisplacing
iginal point Prsquo needs to
heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan
t
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
82|P a g e
353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows
ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ
HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive
353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed
Chapter 3 March of the Froblins
83|P a g e
vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
75|P a g e
void SampleBone( uint nIndex float fU uint nSlice out float4 vRow1 out float4 vRow2 out float4 vRow3 ) compute vertical texture coordinate based on bone index float fV = (nIndices[0]) (30f fTexHeight) compute offsets to texel centers in each row float fV0 = fV + ( 05f fTexHeight ) float fV1 = fV + ( 15f fTexHeight ) float fV2 = fV + ( 25f fTexHeight ) fetch an interpolated value for each matrix row and scale by bone weight vRow1 = fWeight tBonesSampleLevel( sBones float3( fU fV0 nSlice ) 0 ) vRow2 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) vRow3 = fWeight tBonesSampleLevel( sBones float3( fU fV1 nSlice ) 0 ) float3x4 GetSkinningMatrix( float4 vWeights uint4 nIndices float fTime uint nSlice ) derive length of longest packed animation float fKeyCount = fTexWidth float fMaxCycleLength = fKeyCount SAMPLE_FREQUENCY compute normalized time value within this cycle if out of range this will automatically wrap float fCycleLength = fCycleLengths[ nSlice ] float fU = frac( fTime fCycleLength ) convert normalized time for this cycle into a texture coordinate for sampling We need to scale by the ratio of this cycles length to the longest because the texture size is defined by the length of the longest cycle fU = (fCycleLength fMaxCycleLength) float4 vSum1 vSum2 vSum3 float4 vRow1 vRow2 vRow3 first bone SampleBone( nIndices[0] fU nSlice vSum1 vSum2 vSum3 ) vSum1 = vWeights[0] vSum2 = vWeights[0] vSum3 = vWeights[0] second bone SampleBone( nIndices[1] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[1] vRow1 vSum2 += vWeights[1] vRow2 vSum3 += vWeights[1] vRow3 third bone if( vWeights[2] = 0 ) SampleBone( nIndices[2] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[2] vRow1 vSum2 += vWeights[2] vRow2 vSum3 += vWeights[2] vRow3 fourth bone if( vWeights[3] = 0 ) SampleBone( nIndices[3] fU nSlice vRow1 vRow2 vRow3 ) vSum1 += vWeights[3] vRow1 vSum2 += vWeights[3] vRow2 vSum3 += vWeights[3] vRow3 return float3x4( vSum1 vSum2 vSum3)
Listing 6 Shader code to fetch interpolate and blend bone animations
AN
7
3
F(st
Rsat(stfs
T
AdvancesinReNTatarchuk(E
76|P a g e
35 Tess
Figure 9 Ou(left) On theshaders and the tessellate
Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder
FroblLowr
Zbrushreghighr
Table 1 Com
ealͲTimeRendeEditor)
sellation
ur system alle right the textures W
ed character
erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof
lincontrolcagesolutionmod
resolutionFro
mparison of
eringin3DGra
andCrow
lows rendersame chara
While using thr on the left
GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste
edel
blinmodel
memory foo
aphicsandGam
wdRende
ing characteacter is rendhe same memwhereas the
cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo
tprint for hig
mesCoursendashS
ering
ers with extrdered withomory footprie low resolut
asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke
Polygons
5160faces
15+Mfaces
gh and low r
IGGRAPH2008
reme details ut the use oint we are ation characte
60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka
resolution m
8
in close-up of tessellatioable to add er has much
RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf
Vertexa
2Ktimes2K16b
Vert
Ind
models for ou
when using on using idehigh level ofcoarser silh
D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption
TotalMemory
ndindexbuffe
itdisplacemen
texbuffer~27
dexBuffer180
ur main char
tessellationentical pixel f details for
houettes
0and4000fied shaderdhardwareirect3Dreg11universally
ssellationasseauthoredormanceA
y
ers100KB
ntmap10MB
70MB
0MB
racter
C
Fc
Chapter 3 Ma
Figure 10 Ccharacter
arch of the Fr
Comparison
oblins
of low resolution modeel (top) and hhigh resoluttion model (
7
(bottom) for
77|P a g e
the Froblin
AN
7
Hvtcodwotddc[
3
Ihho
F(vd
Gt1g
AdvancesinReNTatarchuk(E
78|P a g e
Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08
351 GPU
n this sectiohardwareashardware fixoutlinedinF
Figure 11 A(also referrevertices thusdisplacemen
Goingthrougtakesaninp15X times fgenerationo
CoarseS
ealͲTimeRendeEditor)
essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]
UTessellati
onwewillpsusedinouxedͲfunctionigure11
An overviewed to as the s amplifyingt obtaining
ghtheproceutprimitiveor Xboxreg 3ofgraphicsc
SuperͲPrimMesh
eringin3DGra
provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional
ionPipelin
provideanorsystemWn tessellator
w of the tesseldquocontrol cagg the input mthe final tess
essforasing(whichwe60 or ATI Rcards)Aver
Tessella
aphicsandGam
veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of
ne
overviewofWedesigned
unitavailab
ellation procgerdquo or ldquothe smesh The vesellated and
gleinputpolrefertoasaRadeonreg HDrtexshader
ator
mesCoursendashS
nefits speciis effective
ologyparamowamountorenderingooverallamodisplaceme
owsustoreddvideomemhe memoryrce Table 1f the benef
GPU tesselanAPIforableonrecen
cess We stasuper-primitertex shader
d displaced h
ygonwehaasuperͲprimD 2000Ͳ4000(whichwer
Tessellated Mesh
IGGRAPH2008
fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are
1demonstrafits provided
lationpipeliaGPUtesselntconsumer
art by rendertive meshrdquo)r is used to high resolutio
avethefollomitive)anda0 GPU generefertoasa
h
Di
8
al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell
ineavailablelationpipelirGPUsThe
ring a coars The tessellaevaluate suron mesh see
wingthehaamplifiesiterations ornevaluation
Evaluatesurfacepositions
Addsplacement
active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw
ducingtheoy relevant fy savings foation can b
eoncurreninetakingade tessellation
se low resoator unit genrface position on the righ
ardwaretess(upto411t64X for Di
nshader)is
TessellatedanMes
ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor
widthThisisverallgamefor consoleorourmainbe found in
tconsumerdvantageofnprocess is
lution mesh nerates new ons and add ht
sellatorunittrianglesorrect3Dreg 11invokedfor
dDisplacedsh
d
Chapter 3 March of the Froblins
79|P a g e
eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
80|P a g e
struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS
C
L GTHsc
Fdb Hnawddeae
Chapter 3 Ma
f f f f o o o o o r
Listing 7 Ex
GiventhatwTraditionallyHoweverthspacenormachangingthe
Figure 12 Ddisplaced in be shaded us
Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese
arch of the Fr
Convert po translatin somewhat f
float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor
ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS
return o
xample simp
wearerendey animatedereexistsaalmapsInteactualdisp
Displacementhe directio
sing normal
e intend toordertocommely thatwo generatentandnormantmapmustthe tangen
MDGPUMeserequireme
oblins
osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota
S = mul( mVPS = float4( v = vNormalWS = vTangentW
S = vBinormal
le evaluation
eringourchcharactersconcernwithatcasewelacednorma
nt of the veon of geometԢ
light thedismbineTSNMwearecomthe displa
almapsalsotbe generantͲspace norhMappertontsaremet
tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir
float4( vPovPositionWS S WS lWS
n shader for
aracterwithare rendehregardstoeareessental(asshown
rtex modifietric normal
splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr
e from objectrotating abo
vPositionOS vNormalOS )vTangentOS )vBinormalOS
ositionWS 1 1 )
r rendering i
hdisplacemered and litousingdisplatiallygeneratninFigure12
es the norma displacem
faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour
t space to woout y we can
) + vInstanc ) )
) )
nstanced tes
entmapweusing tang
acementmatinganewt2)
al used for ment amount
angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor
orld space byn simplify th
cePosWS
ssellated cha
emustmakegentͲspaceappingwhentangentfram
rendering
t D The resu
cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa
8
y rotating anhe math
aracters
eanoteabonormal manlightingusimeasweare
P is the oriulting point
mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat
81|P a g e
nd
outlightingps (TSNM)ngtangentͲedisplacing
iginal point Prsquo needs to
heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan
t
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
82|P a g e
353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows
ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ
HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive
353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed
Chapter 3 March of the Froblins
83|P a g e
vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
AN
7
3
F(st
Rsat(stfs
T
AdvancesinReNTatarchuk(E
76|P a g e
35 Tess
Figure 9 Ou(left) On theshaders and the tessellate
Recentgeneseries havearchitecturetessellationpas describesupportedathe firstͲclasforXboxreg36strongunder
FroblLowr
Zbrushreghighr
Table 1 Com
ealͲTimeRendeEditor)
sellation
ur system alle right the textures W
ed character
erationsofGshown trem(introducedpipelineFured in [KLEINcrossallharsscitizen fea60willuserstandingof
lincontrolcagesolutionmod
resolutionFro
mparison of
eringin3DGra
andCrow
lows rendersame chara
While using thr on the left
GPUarchitecmendous imdwithXboxrthermorew08] and [Grdwareplatfature inthetessellationhowthiste
edel
blinmodel
memory foo
aphicsandGam
wdRende
ing characteacter is rendhe same memwhereas the
cture suchamprovementxreg360) incrwiththeintrGEE08]) tessformsdesigrealͲtimed for extremchnologywo
tprint for hig
mesCoursendashS
ering
ers with extrdered withomory footprie low resolut
asXboxreg36s in geomereasednumroductionofsellation anned for thadomainNexme visual imporksistheke
Polygons
5160faces
15+Mfaces
gh and low r
IGGRAPH2008
reme details ut the use oint we are ation characte
60andATIRetry processberofdedifupcomingd displacemtgenerationxtgenerationpacthighqeytoquicka
resolution m
8
in close-up of tessellatioable to add er has much
RadeonregHDing Thesecated shadegraphicsAPment mappinand thusngames incquality andandsuccessf
Vertexa
2Ktimes2K16b
Vert
Ind
models for ou
when using on using idehigh level ofcoarser silh
D2000300include uniferunits andIssuchasDing will besolidify tescludingthosstableperfofuladoption
TotalMemory
ndindexbuffe
itdisplacemen
texbuffer~27
dexBuffer180
ur main char
tessellationentical pixel f details for
houettes
0and4000fied shaderdhardwareirect3Dreg11universally
ssellationasseauthoredormanceA
y
ers100KB
ntmap10MB
70MB
0MB
racter
C
Fc
Chapter 3 Ma
Figure 10 Ccharacter
arch of the Fr
Comparison
oblins
of low resolution modeel (top) and hhigh resoluttion model (
7
(bottom) for
77|P a g e
the Froblin
AN
7
Hvtcodwotddc[
3
Ihho
F(vd
Gt1g
AdvancesinReNTatarchuk(E
78|P a g e
Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08
351 GPU
n this sectiohardwareashardware fixoutlinedinF
Figure 11 A(also referrevertices thusdisplacemen
Goingthrougtakesaninp15X times fgenerationo
CoarseS
ealͲTimeRendeEditor)
essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]
UTessellati
onwewillpsusedinouxedͲfunctionigure11
An overviewed to as the s amplifyingt obtaining
ghtheproceutprimitiveor Xboxreg 3ofgraphicsc
SuperͲPrimMesh
eringin3DGra
provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional
ionPipelin
provideanorsystemWn tessellator
w of the tesseldquocontrol cagg the input mthe final tess
essforasing(whichwe60 or ATI Rcards)Aver
Tessella
aphicsandGam
veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of
ne
overviewofWedesigned
unitavailab
ellation procgerdquo or ldquothe smesh The vesellated and
gleinputpolrefertoasaRadeonreg HDrtexshader
ator
mesCoursendashS
nefits speciis effective
ologyparamowamountorenderingooverallamodisplaceme
owsustoreddvideomemhe memoryrce Table 1f the benef
GPU tesselanAPIforableonrecen
cess We stasuper-primitertex shader
d displaced h
ygonwehaasuperͲprimD 2000Ͳ4000(whichwer
Tessellated Mesh
IGGRAPH2008
fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are
1demonstrafits provided
lationpipeliaGPUtesselntconsumer
art by rendertive meshrdquo)r is used to high resolutio
avethefollomitive)anda0 GPU generefertoasa
h
Di
8
al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell
ineavailablelationpipelirGPUsThe
ring a coars The tessellaevaluate suron mesh see
wingthehaamplifiesiterations ornevaluation
Evaluatesurfacepositions
Addsplacement
active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw
ducingtheoy relevant fy savings foation can b
eoncurreninetakingade tessellation
se low resoator unit genrface position on the righ
ardwaretess(upto411t64X for Di
nshader)is
TessellatedanMes
ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor
widthThisisverallgamefor consoleorourmainbe found in
tconsumerdvantageofnprocess is
lution mesh nerates new ons and add ht
sellatorunittrianglesorrect3Dreg 11invokedfor
dDisplacedsh
d
Chapter 3 March of the Froblins
79|P a g e
eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
80|P a g e
struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS
C
L GTHsc
Fdb Hnawddeae
Chapter 3 Ma
f f f f o o o o o r
Listing 7 Ex
GiventhatwTraditionallyHoweverthspacenormachangingthe
Figure 12 Ddisplaced in be shaded us
Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese
arch of the Fr
Convert po translatin somewhat f
float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor
ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS
return o
xample simp
wearerendey animatedereexistsaalmapsInteactualdisp
Displacementhe directio
sing normal
e intend toordertocommely thatwo generatentandnormantmapmustthe tangen
MDGPUMeserequireme
oblins
osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota
S = mul( mVPS = float4( v = vNormalWS = vTangentW
S = vBinormal
le evaluation
eringourchcharactersconcernwithatcasewelacednorma
nt of the veon of geometԢ
light thedismbineTSNMwearecomthe displa
almapsalsotbe generantͲspace norhMappertontsaremet
tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir
float4( vPovPositionWS S WS lWS
n shader for
aracterwithare rendehregardstoeareessental(asshown
rtex modifietric normal
splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr
e from objectrotating abo
vPositionOS vNormalOS )vTangentOS )vBinormalOS
ositionWS 1 1 )
r rendering i
hdisplacemered and litousingdisplatiallygeneratninFigure12
es the norma displacem
faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour
t space to woout y we can
) + vInstanc ) )
) )
nstanced tes
entmapweusing tang
acementmatinganewt2)
al used for ment amount
angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor
orld space byn simplify th
cePosWS
ssellated cha
emustmakegentͲspaceappingwhentangentfram
rendering
t D The resu
cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa
8
y rotating anhe math
aracters
eanoteabonormal manlightingusimeasweare
P is the oriulting point
mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat
81|P a g e
nd
outlightingps (TSNM)ngtangentͲedisplacing
iginal point Prsquo needs to
heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan
t
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
82|P a g e
353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows
ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ
HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive
353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed
Chapter 3 March of the Froblins
83|P a g e
vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
C
Fc
Chapter 3 Ma
Figure 10 Ccharacter
arch of the Fr
Comparison
oblins
of low resolution modeel (top) and hhigh resoluttion model (
7
(bottom) for
77|P a g e
the Froblin
AN
7
Hvtcodwotddc[
3
Ihho
F(vd
Gt1g
AdvancesinReNTatarchuk(E
78|P a g e
Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08
351 GPU
n this sectiohardwareashardware fixoutlinedinF
Figure 11 A(also referrevertices thusdisplacemen
Goingthrougtakesaninp15X times fgenerationo
CoarseS
ealͲTimeRendeEditor)
essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]
UTessellati
onwewillpsusedinouxedͲfunctionigure11
An overviewed to as the s amplifyingt obtaining
ghtheproceutprimitiveor Xboxreg 3ofgraphicsc
SuperͲPrimMesh
eringin3DGra
provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional
ionPipelin
provideanorsystemWn tessellator
w of the tesseldquocontrol cagg the input mthe final tess
essforasing(whichwe60 or ATI Rcards)Aver
Tessella
aphicsandGam
veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of
ne
overviewofWedesigned
unitavailab
ellation procgerdquo or ldquothe smesh The vesellated and
gleinputpolrefertoasaRadeonreg HDrtexshader
ator
mesCoursendashS
nefits speciis effective
ologyparamowamountorenderingooverallamodisplaceme
owsustoreddvideomemhe memoryrce Table 1f the benef
GPU tesselanAPIforableonrecen
cess We stasuper-primitertex shader
d displaced h
ygonwehaasuperͲprimD 2000Ͳ4000(whichwer
Tessellated Mesh
IGGRAPH2008
fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are
1demonstrafits provided
lationpipeliaGPUtesselntconsumer
art by rendertive meshrdquo)r is used to high resolutio
avethefollomitive)anda0 GPU generefertoasa
h
Di
8
al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell
ineavailablelationpipelirGPUsThe
ring a coars The tessellaevaluate suron mesh see
wingthehaamplifiesiterations ornevaluation
Evaluatesurfacepositions
Addsplacement
active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw
ducingtheoy relevant fy savings foation can b
eoncurreninetakingade tessellation
se low resoator unit genrface position on the righ
ardwaretess(upto411t64X for Di
nshader)is
TessellatedanMes
ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor
widthThisisverallgamefor consoleorourmainbe found in
tconsumerdvantageofnprocess is
lution mesh nerates new ons and add ht
sellatorunittrianglesorrect3Dreg 11invokedfor
dDisplacedsh
d
Chapter 3 March of the Froblins
79|P a g e
eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
80|P a g e
struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS
C
L GTHsc
Fdb Hnawddeae
Chapter 3 Ma
f f f f o o o o o r
Listing 7 Ex
GiventhatwTraditionallyHoweverthspacenormachangingthe
Figure 12 Ddisplaced in be shaded us
Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese
arch of the Fr
Convert po translatin somewhat f
float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor
ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS
return o
xample simp
wearerendey animatedereexistsaalmapsInteactualdisp
Displacementhe directio
sing normal
e intend toordertocommely thatwo generatentandnormantmapmustthe tangen
MDGPUMeserequireme
oblins
osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota
S = mul( mVPS = float4( v = vNormalWS = vTangentW
S = vBinormal
le evaluation
eringourchcharactersconcernwithatcasewelacednorma
nt of the veon of geometԢ
light thedismbineTSNMwearecomthe displa
almapsalsotbe generantͲspace norhMappertontsaremet
tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir
float4( vPovPositionWS S WS lWS
n shader for
aracterwithare rendehregardstoeareessental(asshown
rtex modifietric normal
splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr
e from objectrotating abo
vPositionOS vNormalOS )vTangentOS )vBinormalOS
ositionWS 1 1 )
r rendering i
hdisplacemered and litousingdisplatiallygeneratninFigure12
es the norma displacem
faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour
t space to woout y we can
) + vInstanc ) )
) )
nstanced tes
entmapweusing tang
acementmatinganewt2)
al used for ment amount
angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor
orld space byn simplify th
cePosWS
ssellated cha
emustmakegentͲspaceappingwhentangentfram
rendering
t D The resu
cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa
8
y rotating anhe math
aracters
eanoteabonormal manlightingusimeasweare
P is the oriulting point
mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat
81|P a g e
nd
outlightingps (TSNM)ngtangentͲedisplacing
iginal point Prsquo needs to
heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan
t
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
82|P a g e
353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows
ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ
HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive
353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed
Chapter 3 March of the Froblins
83|P a g e
vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
AN
7
Hvtcodwotddc[
3
Ihho
F(vd
Gt1g
AdvancesinReNTatarchuk(E
78|P a g e
Hardware tevideo gametessellationcontrolmeshof thedesiredisplacemenwrinklesbumourmainchatruebothfordistributiondeveloperscharacter th[TATARCHUK08
351 GPU
n this sectiohardwareashardware fixoutlinedinF
Figure 11 A(also referrevertices thusdisplacemen
Goingthrougtakesaninp15X times fgenerationo
CoarseS
ealͲTimeRendeEditor)
essellationps One ofwearespechThismeshedobjectWntmappingtmpsanddenaracterThuronͲdiskstosize improwherememhe Froblin8]
UTessellati
onwewillpsusedinouxedͲfunctionigure11
An overviewed to as the s amplifyingt obtaining
ghtheproceutprimitiveor Xboxreg 3ofgraphicsc
SuperͲPrimMesh
eringin3DGra
provides sevthe main acifyingtheshisauthoredWe can thentogreatly inntsarecaptuususingtessorageandfooving loadinmory resourAdditional
ionPipelin
provideanorsystemWn tessellator
w of the tesseldquocontrol cagg the input mthe final tess
essforasing(whichwe60 or ATI Rcards)Aver
Tessella
aphicsandGam
veral keybeadvantagessurfacetopodtohavelon combine rncrease theuredbythesellationalloorsystemanng time Thces are scaroverview of
ne
overviewofWedesigned
unitavailab
ellation procgerdquo or ldquothe smesh The vesellated and
gleinputpolrefertoasaRadeonreg HDrtexshader
ator
mesCoursendashS
nefits speciis effective
ologyparamowamountorenderingooverallamodisplaceme
owsustoreddvideomemhe memoryrce Table 1f the benef
GPU tesselanAPIforableonrecen
cess We stasuper-primitertex shader
d displaced h
ygonwehaasuperͲprimD 2000Ͳ4000(whichwer
Tessellated Mesh
IGGRAPH2008
fically cruciae compressmeterizationofdetailanf this contrountofdetantmapFiguducememomoryfootprsavings are
1demonstrafits provided
lationpipeliaGPUtesselntconsumer
art by rendertive meshrdquo)r is used to high resolutio
avethefollomitive)anda0 GPU generefertoasa
h
Di
8
al for interasion of vertandanimatdjusttocapol cagewithailsHigh freure10showryfootprintintthusrede especiallyatesmemoryd by tessell
ineavailablelationpipelirGPUsThe
ring a coars The tessellaevaluate suron mesh see
wingthehaamplifiesiterations ornevaluation
Evaluatesurfacepositions
Addsplacement
active systemtex data WtiondataforpturetheovhGPU tesseequencydetsanexamplandbandw
ducingtheoy relevant fy savings foation can b
eoncurreninetakingade tessellation
se low resoator unit genrface position on the righ
ardwaretess(upto411t64X for Di
nshader)is
TessellatedanMes
ms such asWhen usingrthecoarseverallshapeellation andailssuchaseofthisfor
widthThisisverallgamefor consoleorourmainbe found in
tconsumerdvantageofnprocess is
lution mesh nerates new ons and add ht
sellatorunittrianglesorrect3Dreg 11invokedfor
dDisplacedsh
d
Chapter 3 March of the Froblins
79|P a g e
eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
80|P a g e
struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS
C
L GTHsc
Fdb Hnawddeae
Chapter 3 Ma
f f f f o o o o o r
Listing 7 Ex
GiventhatwTraditionallyHoweverthspacenormachangingthe
Figure 12 Ddisplaced in be shaded us
Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese
arch of the Fr
Convert po translatin somewhat f
float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor
ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS
return o
xample simp
wearerendey animatedereexistsaalmapsInteactualdisp
Displacementhe directio
sing normal
e intend toordertocommely thatwo generatentandnormantmapmustthe tangen
MDGPUMeserequireme
oblins
osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota
S = mul( mVPS = float4( v = vNormalWS = vTangentW
S = vBinormal
le evaluation
eringourchcharactersconcernwithatcasewelacednorma
nt of the veon of geometԢ
light thedismbineTSNMwearecomthe displa
almapsalsotbe generantͲspace norhMappertontsaremet
tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir
float4( vPovPositionWS S WS lWS
n shader for
aracterwithare rendehregardstoeareessental(asshown
rtex modifietric normal
splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr
e from objectrotating abo
vPositionOS vNormalOS )vTangentOS )vBinormalOS
ositionWS 1 1 )
r rendering i
hdisplacemered and litousingdisplatiallygeneratninFigure12
es the norma displacem
faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour
t space to woout y we can
) + vInstanc ) )
) )
nstanced tes
entmapweusing tang
acementmatinganewt2)
al used for ment amount
angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor
orld space byn simplify th
cePosWS
ssellated cha
emustmakegentͲspaceappingwhentangentfram
rendering
t D The resu
cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa
8
y rotating anhe math
aracters
eanoteabonormal manlightingusimeasweare
P is the oriulting point
mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat
81|P a g e
nd
outlightingps (TSNM)ngtangentͲedisplacing
iginal point Prsquo needs to
heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan
t
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
82|P a g e
353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows
ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ
HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive
353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed
Chapter 3 March of the Froblins
83|P a g e
vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
79|P a g e
eachtessellatedvertexandisprovidedwiththevertexindicesofthesuperͲprimitiveandthebarycentriccoordinatesof thevertex Theevaluationshaderuses this information tocalculate thepositionof thetessellatedvertexusingwhatevertechniqueitwishesTheleveloftessellationcanbecontrolledeitherbyaperͲdrawcalltessellationfactororbyprovidingperͲedgetessellationfactors inavertexbufferforeach triangleedge in the inputmeshWe recommend the interested reader look to [TATARCHUK08] formoredetailsaboutthespecificcapabilitiesoftheGPUtessellationoncurrentandfuturehardware352RenderingCharacterswithInterpolativeTessellationWe use interpolative planar subdivision with displacement to efficiently render our highly detailedcharacters We specify tessellation level controlling the amount of amplification per drawͲcallThereforewecanusetessellationtocontrolhowfinewearegoingtosubdividethischaracterrsquosmeshWecanusetheinformationaboutcharacterlocationonthescreenorotherfactorstocontrolthedesiredamountofdetailsFurthermoreweusethesameartassetsforrenderingthetessellatedcharacterasfortheregularconventionalrenderingusedincurrentgamesCombining tessellationwith instancing allows us to render diverse crowds of characterswithminimalmemoryfootprintandbandwidthutilizationBystoringonlyalowͲresolutionmodel(52Ktriangles)andapplying a displacement map in the evaluation shader we can render a detailͲrich 16M trianglecharacterusingvery littlememoryWecanalso limitperͲvertexanimationcomputationstotheoriginalmeshsinceweonlyneedtostoreanimationdataforthecontrolcageofthecharacterGPUtessellationallows us to provide the data toGPU at coarse resolutionwhile renderingwith high levels of detailListing7providesanexampleofthevertexshaderweusedforevaluatingtheresultingsurfacepositions
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
80|P a g e
struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS
C
L GTHsc
Fdb Hnawddeae
Chapter 3 Ma
f f f f o o o o o r
Listing 7 Ex
GiventhatwTraditionallyHoweverthspacenormachangingthe
Figure 12 Ddisplaced in be shaded us
Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese
arch of the Fr
Convert po translatin somewhat f
float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor
ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS
return o
xample simp
wearerendey animatedereexistsaalmapsInteactualdisp
Displacementhe directio
sing normal
e intend toordertocommely thatwo generatentandnormantmapmustthe tangen
MDGPUMeserequireme
oblins
osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota
S = mul( mVPS = float4( v = vNormalWS = vTangentW
S = vBinormal
le evaluation
eringourchcharactersconcernwithatcasewelacednorma
nt of the veon of geometԢ
light thedismbineTSNMwearecomthe displa
almapsalsotbe generantͲspace norhMappertontsaremet
tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir
float4( vPovPositionWS S WS lWS
n shader for
aracterwithare rendehregardstoeareessental(asshown
rtex modifietric normal
splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr
e from objectrotating abo
vPositionOS vNormalOS )vTangentOS )vBinormalOS
ositionWS 1 1 )
r rendering i
hdisplacemered and litousingdisplatiallygeneratninFigure12
es the norma displacem
faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour
t space to woout y we can
) + vInstanc ) )
) )
nstanced tes
entmapweusing tang
acementmatinganewt2)
al used for ment amount
angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor
orld space byn simplify th
cePosWS
ssellated cha
emustmakegentͲspaceappingwhentangentfram
rendering
t D The resu
cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa
8
y rotating anhe math
aracters
eanoteabonormal manlightingusimeasweare
P is the oriulting point
mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat
81|P a g e
nd
outlightingps (TSNM)ngtangentͲedisplacing
iginal point Prsquo needs to
heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan
t
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
82|P a g e
353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows
ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ
HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive
353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed
Chapter 3 March of the Froblins
83|P a g e
vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
80|P a g e
struct VSInput float3 vPositionOS_vert0 POSITION0 float3 vNormalOS_vert0 NORMAL0 float3 vTangentOS_vert0 TANGENT0 float3 vBinormalOS_vert0 BINORMAL0 float2 vUV_vert0 TEXCOORD0 float3 vPositionOS_vert1 POSITION1 float3 vNormalOS_vert1 NORMAL1 float3 vTangentOS_vert1 TANGENT1 float3 vBinormalOS_vert1 BINORMAL1 float2 vUV_vert1 TEXCOORD1 float3 vPositionOS_vert2 POSITION2 float3 vNormalOS_vert2 NORMAL2 float3 vTangentOS_vert2 TANGENT2 float3 vBinormalOS_vert2 BINORMAL2 float2 vUV_vert2 TEXCOORD2 float3 vInstancePosWS WSInstancePosition This is per-instance data float3 vBarycentric TessCoordinates Tessellation-specific system-generated values struct VSOutput float3 vNormalWS Normal float3 vTangentWS Tangent float3 vBinormalWS Binormal float2 vUV TEXCOORD0 float fGroup GroupID float4 vPositionWS Position float4 vPositionSS SV_POSITION float4x4 mVP Texture2Dltfloatgt tDisplacement SamplerState sPointClamp SamplerState sBaseLinear float fDisplacementScale float fDisplacementBias VSOutput VS( VSInput i ) VSOutput o Interpolated tessellated vertex float3 vPositionOS = ivPositionOS_vert0 ivBarycentricx + ivPositionOS_vert1 ivBarycentricy + ivPositionOS_vert2 ivBarycentricz float3 vNormalOS = ivNormalOS_vert0 ivBarycentricx + ivNormalOS_vert1 ivBarycentricy + ivNormalOS_vert2 ivBarycentricz float3 vTangentOS = ivTangentOS_vert0 ivBarycentricx + ivTangentOS_vert1 ivBarycentricy + ivTangentOS_vert2 ivBarycentricz float3 vBinormalOS = ivBinormalOS_vert0 ivBarycentricx + ivBinormalOS_vert1 ivBarycentricy + ivBinormalOS_vert2 ivBarycentricz Interpolated texture coordinates ovUV = ivUV_vert0 ivBarycentricx + ivUV_vert1 ivBarycentricy + ivUV_vert2 ivBarycentricz Displace vertex by objects displacement map float fDisplacement = tDisplacementSampleLevel( sPointClamp ovUV 0 )r fDisplacement = fDisplacement fDisplacementScale + fDisplacementBias vPositionOS = vPositionOS + fDisplacement vNormalOS
C
L GTHsc
Fdb Hnawddeae
Chapter 3 Ma
f f f f o o o o o r
Listing 7 Ex
GiventhatwTraditionallyHoweverthspacenormachangingthe
Figure 12 Ddisplaced in be shaded us
Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese
arch of the Fr
Convert po translatin somewhat f
float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor
ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS
return o
xample simp
wearerendey animatedereexistsaalmapsInteactualdisp
Displacementhe directio
sing normal
e intend toordertocommely thatwo generatentandnormantmapmustthe tangen
MDGPUMeserequireme
oblins
osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota
S = mul( mVPS = float4( v = vNormalWS = vTangentW
S = vBinormal
le evaluation
eringourchcharactersconcernwithatcasewelacednorma
nt of the veon of geometԢ
light thedismbineTSNMwearecomthe displa
almapsalsotbe generantͲspace norhMappertontsaremet
tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir
float4( vPovPositionWS S WS lWS
n shader for
aracterwithare rendehregardstoeareessental(asshown
rtex modifietric normal
splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr
e from objectrotating abo
vPositionOS vNormalOS )vTangentOS )vBinormalOS
ositionWS 1 1 )
r rendering i
hdisplacemered and litousingdisplatiallygeneratninFigure12
es the norma displacem
faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour
t space to woout y we can
) + vInstanc ) )
) )
nstanced tes
entmapweusing tang
acementmatinganewt2)
al used for ment amount
angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor
orld space byn simplify th
cePosWS
ssellated cha
emustmakegentͲspaceappingwhentangentfram
rendering
t D The resu
cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa
8
y rotating anhe math
aracters
eanoteabonormal manlightingusimeasweare
P is the oriulting point
mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat
81|P a g e
nd
outlightingps (TSNM)ngtangentͲedisplacing
iginal point Prsquo needs to
heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan
t
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
82|P a g e
353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows
ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ
HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive
353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed
Chapter 3 March of the Froblins
83|P a g e
vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
C
L GTHsc
Fdb Hnawddeae
Chapter 3 Ma
f f f f o o o o o r
Listing 7 Ex
GiventhatwTraditionallyHoweverthspacenormachangingthe
Figure 12 Ddisplaced in be shaded us
Howeverwenormals InoaremetNawas used tdisplacemendisplacemenencoded inavailableAMensurethese
arch of the Fr
Convert po translatin somewhat f
float3 vPositfloat3 vNormafloat3 vTangefloat3 vBinor
ovPositionSSovPositionWSovNormalWS ovTangentWS ovBinormalWS
return o
xample simp
wearerendey animatedereexistsaalmapsInteactualdisp
Displacementhe directio
sing normal
e intend toordertocommely thatwo generatentandnormantmapmustthe tangen
MDGPUMeserequireme
oblins
osition and tng because wefor extra pertionWS = RotaalWS = RotaentWS = RotarmalWS = Rota
S = mul( mVPS = float4( v = vNormalWS = vTangentW
S = vBinormal
le evaluation
eringourchcharactersconcernwithatcasewelacednorma
nt of the veon of geometԢ
light thedismbineTSNMwearecomthe displa
almapsalsotbe generantͲspace norhMappertontsaremet
tangent framee are always rf ate2D( vDir ate2D( vDir ate2D( vDir ate2D( vDir
float4( vPovPositionWS S WS lWS
n shader for
aracterwithare rendehregardstoeareessental(asshown
rtex modifietric normal
splacedsurfMwithdisplputing tangacement anousedidentated at thermal mapoolwhichpr
e from objectrotating abo
vPositionOS vNormalOS )vTangentOS )vBinormalOS
ositionWS 1 1 )
r rendering i
hdisplacemered and litousingdisplatiallygeneratninFigure12
es the norma displacem
faceusing taacementmaentspacedd normal micaltangentsame timewould matcrovidessour
t space to woout y we can
) + vInstanc ) )
) )
nstanced tes
entmapweusing tang
acementmatinganewt2)
al used for ment amount
angentspacappingweduring rendemaps andspaceandmas thenormch the desicecodefor
orld space byn simplify th
cePosWS
ssellated cha
emustmakegentͲspaceappingwhentangentfram
rendering
t D The resu
cenormalmneedtoensering in thethat the gmodelsInomalmap Inred normaltangentͲspa
8
y rotating anhe math
aracters
eanoteabonormal manlightingusimeasweare
P is the oriulting point
mapusing thsureseveralsameexactgeneration ptherwords that case Ԣ By usiacegenerat
81|P a g e
nd
outlightingps (TSNM)ngtangentͲedisplacing
iginal point Prsquo needs to
heencodedconstraintsmannerasprocess forideallythethenormaling publiclyionwecan
t
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
82|P a g e
353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows
ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ
HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive
353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed
Chapter 3 March of the Froblins
83|P a g e
vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
82|P a g e
353 TessellatedCharactersLevelofDetailControlIntheFroblinsdemoweuseathreeͲlevelstaticLODschemeasdiscussedinsection332Tessellationand displacementmapping are applied only to the characters in themost detailed level In order toguaranteeastableframerateindensecrowdsituationswecomputethetessellationlevelasafunctionof the number of tessellated characters This is effective in avoiding a polygonal count explosion andretainstheperformancebenefitsofgeometryinstancingThetessellationlevelissetasfollows
ൌ ʹሺ ௫ ǡ ͳǡ ௫ሻ
HereTisthetessellationleveltobeusedforcharacterinstancesinthefirstdetaillevelNisthenumberofcharacterinstancesinthefirstdetaillevelandTmaxisthemaximumtessellationleveltouseforasinglecharacterThisschemeeffectivelyboundsthenumberoftrianglescreatedbythetessellatorandensuresthattheprimitivecountwillneverincreasebymorethanthecostoftwofullytessellatedcharacters Ifthere aremore than two such characters in the view frustum this schemewill divide the tessellatedtriangles evenly among them Naturally this can lead to popping as the size of the crowd changesdramaticallyfromoneframetothenextbut ina livelyscenewithnumerousanimatedcharactersthispoppingishardtoperceive
353 RenderingOptimizationsBecausehardwaretessellationcangeneratemillionsofadditionaltrianglesitisessentialtominimizetheamountofperͲvertexcomputationOurcharactervertexshadersalreadyuseafairlyexpensivetechniqueforskinnedanimationontheGPU(seeSection34)andperformingtheseanimationcalculations insidetheevaluationshaderiswastefulWe improve performance with a multiͲpass approach for rendering out animated characters Wecompute control cagepreͲpasswherewe can compute all relevant computations for theoriginal lowresolution mesh such as animation and vertex transformations This method is general and takesadvantageofDirect3Dreg10streamoutfunctionalityNotethatinordertoreducetheamountofmemorybeing streamed out per character as well as reduce vertex fetch and vertex cache reuse for theevaluation shader we augmented our control cagemultiͲpassmethod with vertex compression anddecompressiondescribedbelowNote thatusing thismultiͲpassmethod for control cage rendering isbeneficialnotonlyforrenderingtessellatedcharactersbutforanyrenderingpipelinewherewewishtoreuseresultsofexpensivevertexoperationsmultipletimesForexamplewecanusetheresultsofthefirstpassforouranimatedandtransformedcharactersforrendering intoshadowmapsandcubemapsforreflectionsWeperform theskinningcalculationsonlyoncepercharacteron thesuperͲprimitiveverticesand theresultsaresimply interpolatedfromthesuperͲprimitivesbytheevaluationshader Inthefirstpasswerenderallofthecharacterverticesanasinstancedsetofpointprimitivesperformskinningonthem(asdescribed insection34)andstreamouttheresults intoabuffer Inthesecond(tessellated)passtheinstance IDand superͲprimitivevertex IDsareusedby theevaluation shader to fetch the transformed
Chapter 3 March of the Froblins
83|P a g e
vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
83|P a g e
vertexdatainterpolateanewvertexandapplydisplacementmappingNotethattheonlyquantitiesthatneedtobeoutputinthefirstpassarequantitiesaffectedbythetransformations(suchasvertexpositionsandnormalsbutnotthetexturecoordinatesorvertexcolorsforexample)Although it is helpful to stream and reͲuse the skinning calculations this alone is not very effectivebecausethevertexdatawillbestreamedatfullprecisionandtheevaluationshadermuststillpayalargecostinmemorybandwidthandfetchinstructionsinordertoretrieveitAdditionallywewouldneedtoallocatesufficientstreamoutbuffertostoretransformedverticesWeuseacompressionschemetopackthetransformedverticesintoacompact128ͲbitformatinordertoremovethisbottleneckandtoreducetheassociatedmemoryfootprintThisallowsthetessellationpasstoloadafullsetoftransformedvertexdatausingasinglefetchpersuperͲprimitivevertexAlthoughthecompressionschemerequiresadditionalALU cycles for both compression and decompression this ismore than paid for by the reduction inmemorybandwidthandfetchoperationsintheevaluationshaderWecompressvertexpositionsbyexpressingthemasfixedͲpointvalueswhichareusedtointerpolatethecornersofasufficientlylargeboundingboxwhichislocaltoeachcharacterThenumberofbitsneededdependsonthesizeofthemodelandthedesiredqualitylevelbutitdoesnotneedtobeextremelylargeInourcasethedynamicrangeofourvertexdata isroughly600cm A16Ͳbitcoordinateonthisscalegivesaresolutionofabout90micronswhichisslightlylargerthanthediameterofahumanhairWe can compress the tangent frame by converting the basis vectors to spherical coordinates andquantizingthem Sphericalcoordinatesarewellsuitedtonormalcompressionsinceeverycompressedvalue inthesphericaldomaincorrespondstoauniqueunitͲlengthvector InaCartesianrepresentation(such as thewidely usedDEC3N format) a large fraction of the space of compressed valueswill gounusedWhatthismeansinpracticeisthatamuchsmallerbitcountcanbeusedtorepresentsphericalcoordinatesatareasonablelevelofaccuracyWehavefoundthatusingan8Ͳbitsphericalcoordinatepairfornormalsresults inrendered imagesthatarecomparable inqualitytoa32ͲbitCartesianformat ThemaindrawbackofusingsphericalcoordinatesisthatanumberofexpensivetrigonometricfunctionsmustbeusedforcompressionanddecompressionbutwehavefoundthatthebenefitsofasmallcompressedformatoutweightheadditionalALUcostoncurrentgraphicshardwareTexturecoordinatesarecompressedbyconvertingtheUVcoordinates intoapairof fixedͲpointvaluesusing whatever bits are left In order to ensure acceptable precision this requires that the UVcoordinatesinthemodelbeconfinedtothe0Ͳ1rangewithnoexplicittilingoftexturesbytheartistForsmall textures a smaller bit count could be used for theUV coordinates provided that theUVs aresnappedtothetexelcenters
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
84|P a g e
Figure 13 Data format used for compressed animated vertices
Ourbit layout for the compressedvertices is shown inFigure13and corresponding compressionanddecompressioncodeisshowninListings8and9Weuse16bitsforeachcomponentofthepositiontwo8Ͳbitsphericalcoordinatesforthetangent32bitsforthenormaland16foreachUVcoordinateSinceour tangent frames are orthogonalwe refrain from storing the binormal and instead reͲcompute itbasedonthedecompressednormalandtangentSinceafull32ͲbitfieldisavailableweuseDEC3NͲlikecompression for the normal which requires fewer ALU operations than spherical coordinates Ifadditionaldatafieldsareneededwehavealsofoundthat8ͲbitsphericalcoordinatescanbeusedforthenormalataqualitylevelcomparabletoDEC3NWeexperimentedwithallofthesealternativesontheATIRadeonTMHD4870GPUbutfoundlittlepracticaldifferenceinperformanceorqualitybetweenanyofthemWebelievethatthecompressedformatthatweuseherewouldalsomakeanexcellentstorageformatforstaticgeometry Inthiscase(andalsoforthecaseofnonͲinstancedcharacters)thedecompressioncouldbeacceleratedby leveraging thevertex fetchhardware toperform someof the integer to floatconversions We cannotdo this inour casebecausewemustexplicitly fetchvertexdatawithbufferloadsusingtheinstanceIDofthecharacterinsteadofusingthefixedfunctionvertexfetchWeobtaineda25performance improvementviaourmultiͲpass techniqueandweobservedgainsashighas37with using the compression scheme Due to quantization used for compression there are subtledifferencesbetweenthetwoimagesduetocompressionHowevertheseartifactsaredifficulttonoticein a dense dynamic crowd of animated characters and even difficult to discern in static comparisonscreenshots
X (16)
Y (16)
Z (16)
Nx (10)
Ny (11)
Tș (8)
Tij (8)
U (16)
V (16)
32 Bits
R
G
B
A
Nz (11)
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
85|P a g e
Quantizes a floating point value (0-1) to a certain number of bits uint Quantize( float v uint nBits ) float fMax = ((float) (1 ltlt nBits))-10f return uint( round(vfMax) ) uint PackShorts( uint nHigh uint nLow ) return (nHigh ltlt 16) | (nLow) uint PackBytes( uint nHigh uint nLow ) return (nHigh ltlt 8) | (nLow) Converts a vector to spherical coordinates Theta (x) is in the 0-PI range Phi (y) is in the -PIPI range float2 CartesianToSpherical( float3 cartesian ) cartesian = clamp( normalize( cartesian ) -11 ) beware of rounding error float theta = acos( cartesianz ) float s = sqrt( cartesianx cartesianx + cartesiany cartesiany ) float phi = atan2( cartesianx s cartesiany s ) if( s == 0 ) phi = 0 prevent singularity if normal points straight up return float2( theta phi ) Converts a normal vector to quantized spherical coordinates uint2 CompressVectorQSC( float3 v uint nBits ) float2 vSpherical = CartesianToSpherical( v ) return uint2( Quantize( vSphericalNormx PI nBits ) Quantize( (vSphericalNormy + PI ) ( 2PI ) nBits ) ) Encodes position as fixed-point lerp factors between AABB corners uint3 CompressPosition( float3 vPos float3 vBBMin float3 vBBMax uint nBits ) float3 vPosNorm = saturate( (vPos - vBBMin) (vBBMax-vBBMin) ) return uint3( Quantize( vPosNormx nBits ) Quantize( vPosNormy nBits ) Quantize( vPosNormz nBits ) ) uint PackCartesian( float3 v ) float3 vUnsigned = saturate( (vxyz 05) + 05 ) uint nX = Quantize( vUnsignedx 10 ) uint nY = Quantize( vUnsignedy 11 ) uint nZ = Quantize( vUnsignedz 11 ) return ( nX ltlt 22 ) | ( nY ltlt 11 ) | nZ uint4 PackVertex( CompressedVertex v float3 vBBoxMin float3 vBBoxMax ) uint3 nPosition = CompressPosition( vvPosition vBBoxMin vBBoxMax 16 ) uint2 nTangent = CompressVectorQSC( vvTangent 8 ) uint4 nOutput nOutputx = PackShorts( nPositionx nPositiony ) nOutputy = PackShorts( nPositionz PackBytes( nTangentx nTangenty ) ) nOutputz = PackCartesian ( vvNormal ) nOutputw = PackShorts( Quantize( vUVx 16 ) Quantize( vUVy 16 ) ) return nOutput
Listing 8 Compression code for vertex format given in Figure 13
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
86|P a g e
float DeQuantize( uint n uint nBits ) float fMax = ((float) (1 ltlt nBits)) - 10f return float(n)fMax float3 DecompressVectorQSC( uint2 nCompressed uint nBitCount ) float2 vSph = float2( DeQuantize( nCompressedx nBitCount ) DeQuantize( nCompressedy nBitCount ) ) vSphx = vSphx PI vSphy = (2 PI vSphy) - PI float fSinTheta = sin( vSphx ) float fCosTheta = cos( vSphx ) float fSinPhi = sin( vSphy ) float fCosPhi = cos( vSphy ) return float3( fSinPhi fSinTheta fCosPhi fSinTheta fCosTheta ) float3 DecompressPosition( uint3 nBits float3 vBBMin float3 vBBMax uint nCount ) float3 vPosN = float3( DeQuantize( nBitsx nCount) DeQuantize( nBitsy nCount) DeQuantize( nBitsz nCount) ) return lerp( vBBMinxyz vBBMaxxyz vPosN ) float3 UnpackPosition( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) uint3 nPos nPosxy = uint2( nPackedx gtgt 16 nPackedx amp 0x0000ffff ) nPosz = nPackedy gtgt 16 return DecompressPosition( nPos vBBoxMin vBBoxMax 16 ) float2 UnpackUV( uint4 nPacked ) uint2 nUV = uint2( nPackedw gtgt 16 nPackedw amp 0x0000ffff ) float2 vUV = float2( DeQuantize( nUVx 16 ) DeQuantize( nUVy 16 ) ) return vUV float3 UnpackTangent( uint4 nPacked ) uint2 nTan = uint2( (nPackedy gtgt 8) amp 0xff nPackedy amp 0xff ) return DecompressVectorQSC( nTan 8 ) float3 UnpackCartesian( uint n ) uint nX = (n gtgt 22) amp 0x3FF uint nY = (n gtgt 11) amp 0x7FF uint nZ = n amp 0x7FF float fX = (20f DeQuantize( nX 10 )) - 10f float fY = (20f DeQuantize( nY 11 )) - 10f float fZ = (20f DeQuantize( nZ 11 )) - 10f return float3( fX fY fZ ) CompressedVertex UnpackVertex( uint4 nPacked float3 vBBoxMin float3 vBBoxMax ) CompressedVertex vVert vVertvPosition = UnpackPosition( nPacked vBBoxMin vBBoxMax ) vVertvNormal = UnpackCartesian( nPackedz ) vVertvTangent = UnpackTangent( nPacked ) vVertvBinormal = normalize( cross( vVertvTangent vVertvNormal ) ) vVertvUV = UnpackUV( nPacked ) return vVert
Listing 9 Decompression code for vertex format given in Figure 13
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
87|P a g e
354 Displacement Map Tips and Ensuring Watertightness Wewould liketoshareseveralpracticaltipsforgenerationandusingofdisplacementmapsthatwersquovelearned throughout our process Firstly themethod used for generation of displacementmapsmustmatch themethod forevaluating subdivided surfaceThisnaturally correlates to theabsoluteneed toknowtheprocessusedbythemodelingtoolusedformapgenerationsManyDCCtoolssuchasAutodeskMayaregwillfirstuseapproximatingsubdivisionprocesssuchasCatmullͲClarksubdivisionmethodonthecontrolmesh(thelowresolutionorsuperͲprimitivemesh)OncethemeshhasbeensmoothedthenthefineͲscaledetailsarecapturedintoascalarorvectordisplacementmapWhenusingthesetoolswemustevaluatethefinalsurfaceusingCatmullͲClarksubdivisionmethodsHowevertheevaluationshadersarereasonablyexpensivewhichprecipitatedourdecision touse interpolativeplanarsubdivisiondue to itsextreme simplicity for evaluation Additionally a number of concerns arise with topology fixͲup andtreatment of extraordinary points as well as patch reordering to ensure watertightness duringdisplacementHowevershould the interested readermaywish to investigate furtheraboutusingGPUtessellation for CatmullͲClark surfaces they can find additional material and further references in[TATARCHUK08]Inour caseweused theAMDGPUMeshMapper tool ([AMDGMM08])designed specifically for robustgenerationofdisplacementmapsforinterpolativeplanarsubdivisionSpecificallygivenapairoflowandhighresolutionmeshesthistoolprovidesanumberofdifferentoptionsforcontrollingtheenvelopesforraycasting from low tohigh resolutionmap inorder tocapturedisplacementandnormal informationFurthermoreinordertoachievecontrollableresultsatrunͲtimewemustknowtheexactfloatingpointvalues for displacement scale and bias for the generated displacementmap This tool provides thisinformation collected during the generation process in the form of parameterswhich can be useddirectlyintheshaderParticularcareneedstobepaidduringdisplacementmapping inordertogenerateresultingwatertightsurfaces This is true regardless of the subdivisionmethod used for evaluation One challenge withrenderingcomplexcharacterswithdisplacementmapsthatcontaintextureuvbordersistheintroductionoftextureuvseams (seeFigure14 foranexampleofsuchadisplacementmap)Unlessneighboringuvbordersare laidoutwith thesameorientationsand lengthsdisplacingwith thesemapswill introducegeometry cracks along theuvborders (Figure14) Thishappensdue tobilineardiscontinuities and tovarying floating point precision on different regions of the texturemap Seamless parameterizationsremovebilinearartifactsbutdonotsolvefloatingpointprecisionissuesOneͲtoͲoneparameterizationisextremelydifficulttoobtainforcomplexcharactersifnotimpossibleinpracticalscenariosWesolvethisproblemduringthemapgenerationprocessratherthanatrunͲtimeviaadditionalfeaturesimplementedas part of the GPUMeshMapper toolWe postͲprocess our displacementmaps by correcting all thetextureuvbordersduringthedisplacementmapgenerationbyidentifyingthebordertriangleedgesandperformingfilteringacrossedges(withadditionalfixͲup)toalleviatedisplacementseams
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
88|P a g e
Figure 14 Example of a visible crack generated due to inconsistent values across the edges of displacement map for this character On the left we highlighted the specific edges along the seam Note that the adjacent edges for this seam do not have uniform parameterization
36 LightingandShadowing
361 RenderingShadowsinLargeScaleEnvironmentHighquality renderingsystem requiresdynamicshadowscastbycharactersonto theenvironmentandthemselves Tomanage shadowmap resolution our system implements Parallel Split ShadowMaps[ZSXL06]TheviewͲfrustumtestdescribed inSection33 isusedtoensurethatonlycharactersthatarewithinaparticularparallelsplit frustumare renderedOcclusioncullingcouldalsobeused forshadowmapsaswellbutwedonotdothis inoursystembecauseonlycharactersandsmallersceneelementsare rendered into the shadowmapsand there is little to cull the charactersagainst (shadows castbyterrainarehandledseparately)WeuseaggressivefilteringforgenerationofsoftshadowsThisallowsustousefurthermeshsimplificationfortheLODrenderedintoshadowmapsForcharactersinthehigherͲdetailshadowfrustaweusethesamesimplifiedgeometrythatisusedforthemostdistantlevelofdetailduringnormalrenderingFormoredistantshadowswecanuseamoreextremesimplification
Displacement map Rendered character using the displacement map with a seam
Zoomed-in view shows a crack in the surface
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
C
3
F[cIdp[Semplaubd3Omdmsatg
Chapter 3 Ma
362Ligh
Figure 15 A[left] the fulcheckerboard
nthissectiodynamicdayprecompute[CHEN08]AsSHLM isqueexample tomap a SHLMprovidingdeightmaplikandpropsWusedthisdaby our charadynamiccha
3621SH
Ouroutdoormodeledasadynamicranmatches oursamplesareapproximatethesamplesgroundonly
arch of the Fr
hting
A medium relly shaded ted pattern ov
onwedescriynightcycled incident lisingletexeleriedand thcompute aM can be setailed lightiearadianceWebrieflymtafor lightinacters intoractershado
HLMGenera
rsceneiscoadirectionageenvironmr desired ligtakenata
elyhalfofouwouldworkcaptureap
oblins
esolution sperrain [centverlay to indi
bethelightieourglobaightingontinanSHLMhe lightingediffuse lighstored at angresults (FecacheforcmotivateandngFinallywthe static sowsoverlap
ation
mprisedoftllightsourcementmapTghtmap respoint justaurcharacterrsquokwellforligpartial lighti
herical harmter] just the icate light m
ingsystemulscene lightheterrainWstoresacomenvironmenthting termlower resoFigure15)Tcomputinglidiscussourwepresentscenewhilestaticterrai
twomaingloeTheseconTogeneratesolution of 1above the tersquosheightashtingbothtingenvironm
monic light lighting fro
map texel den
usedintheAting isstaticWechosetompletelightit isevaluateBecause theolution thanThisdecoupightingonscrmethodforasimpletecavoiding ldquodinshadows
oballightemndaryemittetheSHLMt1024 times 1024errainSampshowninFitheterrainamentandar
map is usedom the sphernsity
AMDFroblinscwhichenabouseaSphingenvironmed in thedire shadingnthe finest
plingalsoenceneelemenrgeneratingchniquefordouble shad
mittersTheeristheskyitheterrainis4At the ceplesare takgure16Thiaswellastherepoorly su
d to light a hrical harmon
sdemoTheblesustouericalHarmmentatthatrectionof thormal isdelevel of sh
nablesustontssuchasogtheSHLMaintegratingdowingrdquo arti
primaryemitselfwhichsdividedunenter of eacenatadistisheightwasecharactersuited for sha
8
highly detainic light map
edemodoessea lightmonicLightMtpointAtruhe shadingcoupled froading detaiusetheterourdynamicandthenshdynamicshaifacts in reg
itteristhesismodeleduniformlyintoch grid squatanceoff theschosentosSamplestaadingpoints
89|P a g e
iled terrain ap [right] a
snothaveamaptostoreMap (SHLM)untimethenormal form the lightl while stillrainrsquosstaticccharactersowhowweadowscastgionswhere
sunwhichisusingahighoagridthatare lightinge terrainofensurethatakenonthesabove the
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
AN
9
tstpc
FsnhaseAdtmrp
3Fsulfrcsn
AdvancesinReNTatarchuk(E
90|P a g e
terrainthatsamplestakethiscan leadpracticewecompletelig
Figure 16 Tsample takennot suitable hemisphere oapproximateshading chaenvironment
Ateach samdirect light ftesting foromodestrecuradianceistpointtexture
3622Re
For renderinsuggestedusus tocomprighting resufoundthatwratherthanacouldbestospherical hanegativevalu
ealͲTimeRendeEditor)
mayrequireentoofarofdtomissingfoundthathtingenviro
The lighting n on the terr
for shadingof the lightin
ely half of oracters as wthat capture
mplepointdfrom the suocclusion Inursion limithenstoredesusingthe
enderingUs
ngwe packsingvariousress thedatultsparticulawecouldgetahigherresredusingasrmonic coeues
eringin3DGra
ethemissinffthegroungorunnaturtakingsamp
onmentsthat
environmenrain surfaceg characterng environmour charactewell as the es bounced l
directand inn is computdirect lightinalldirectusing3rdordOpenEXRim
singSHLM
k the 3rd ordcompressioaandwe foarly inareasthigherquaolutioncomsharedexpofficients can
aphicsandGam
g lowerhemdcancreateralterrainshplesatamotwereusefu
nt is capturee does not cors that may
ment [Centerersrsquo height terrain itse
lighting from
ndirect lighttedby castifromthesutionsonthedersphericamagefilefor
der spectraonschemes[ound that thsofhighconlitylighting
mpressedSHLonenttexturnnot be sto
mesCoursendashS
misphereofeartifactswhadowbounoderateheigulforshadin
ed at a poinontain usefu
y have shadr] A sample ensuring th
elf [Right] Am the ground
t iscapturedingadistribunand fromsphereusialharmonicsmat
l SHLM dat[WWS07][heuncomprntrastsuchresultsbystLMWedideformat(RGored in this
IGGRAPH2008
the lightingwhenusedfondariesaswghtabovethgboththet
nt by firing rul lighting dading normals
point is offshe captured A characterd below
dandprojecbutionof raythesky iscingastratifisThisdata
a into seveHU08]butoressedcoeffasshadowtoringalowobservethaGBE)withmway becau
8
genvironmeorshadingthwellas incorheterrainwterrainandt
rays into theata in the los that poin
fset from the lighting env
r requires a
cted into spys in thedicollectedbyiedsamplingiswrittento
n RGBA16Fourmemoryficientsgaveboundaries
werresolutioattheDCcominimallossse the form
entOntheoheterrainInrectselfͲref
wewereablehecharacte
e environmeower hemispt down intoterrain at avironment is full spheric
phericalharmrectionof ty firing800rgschemeTodiskas16Ͳ
texturesObudgetdide slightlyhig In thesesconuncompremponentsoinqualityh
mat does no
otherhandnparticularlectance Inetocapturers
ent [Left] A phere and is o the lower a distance of s useful for cal lighting
monicsThehe sunandrayswithaThe incidentͲbitfloating
Others havenotrequiregherqualitycenariosweessedSHLMoftheSHLMhigherorderot allow for
f
g
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
C
FlldquotSarttsddOwdtete
Fmos
Chapter 3 Ma
Final lightingight from thldquoresidualrdquoenthelistingat
Sinceourchaamore tradrendertheirthiswouldinterrainselfosunrsquos contribdirectionallidiscussionon
Oncethedowe test if thdirectionthethe vectorseffectsofthetoattenuateenvironment
Figure 17 mountain thoccluded regshadowing a
arch of the Fr
g is computhe linear tenvironmenttheendoft
aractersareditional realͲshadowsWncuradoublocclusionasbution to thghtfromthenextracting
ominantdirehis lightrsquosdienwedetermdisagree theshadowmethedominatPleasesee
Characters he right halfgion of the
artifacts
oblins
ed inapixerms then slighting [SLOthissection
dynamicthͲtime shado
WedidnotweshadowingshowninFihe lightmaelightingenadominant
ectional lightrection corrminethatthhen the pixemaparefadeantlightingtethesample
cast shadowlf is in diree terrain [B
el shaderbysumming theOAN08]Dire
heirshadowsowingmethowanttosimpgartifactinigure17Idep This cannvironmentitlightfroma
t isremovedrespondswihepixelisinel is considedoutOncetermwhichecodeprovid
ws on the teect sun lightBottom] A
y sampling te contributiectXregHLSLs
scannotbeod parallelͲlydarkenthregionsthateallywewoube very nicintheterraiasphericalh
dfromthe lith the sunrsquosdirectsunlered to alretheadjusteisthenaddededatthee
errain The t [Top] Chshadow cor
theSHLM ron of this dshadercode
bakedintoͲspit shadoweterrainwhtarealreadyuldlikethecely approxnrsquospixelshaharmonicligh
ightingenvisdirectionightandtheeady be occedshadowinedtotheremndofthisse
left half ofharacters inrrection fac
removing thdominant diedemonstra
theSHLMawmappinghereverachyshadowedshadowmapximated by saderPleasehtingenviron
ironmentsaIfboth veceshadowmacluded fromgtermiscomainingspheection
f the image correctly ca
ctor is appl
9
edominantirectional ligating this is
sapreproce[ZSXL06]waractercastinthelightptoonlyattseparating areferto[SLOnment
mpledfromtorspoint iapshouldbe the sun anmputeditiericalharmo
is in the shast double slied to prev
91|P a g e
directionalght and theprovided in
essInsteadwas used tosashadowmapduetotenuatethea dominantOAN08]fora
mtheSHLMn the sameeappliedIfnd thus thesthenusedoniclighting
hadow of a shadows on vent double
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
92|P a g e
WealsousetheterrainrsquosSHLMforlightingourcharactersandsceneprops(Figure18)Inthecharactersrsquopixelshader thepointbeingshaded isprojectedonto the texturespaceof theSHLMandsamplesaretakenwhicharethenusedtoapproximatealightingenvironmentforthecharacterThisdoesnotprovideanylightingvariationalongtheverticalaxisbutinpracticeitworksquitewellevenfortallsceneelementssuchasthetent inthefigurersquosforegroundorthepagoda inthefigurersquosbackgroundAdditionaltexturemapscouldbeusedtostoreverticalgradientsforthesphericalharmoniccoefficientsthiswouldprovidemoreaccuratelightingenvironmentreconstructionforpointslocatedabovetheterrain[AKDS04]
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
C
Fe
Chapter 3 Ma
Figure 18 Denvironment
arch of the Fr
Dynamic chafor shading
oblins
aracters (bog by sampling
ttom) and otg from the te
ther static scerrainrsquos SHL
cene props (LM
(top) build a
9
an approxim
93|P a g e
mate lighting g
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
94|P a g e
Samplers sampler g_sSHLMPoint sampler g_sSHLMBilinear sampler g_sSHLMTrilinear sampler g_sSHLMAnisotropic SHLM Textures Texture2Dltfloat4gt tSHLM_R0 DC amp linear (red) Texture2Dltfloat4gt tSHLM_R1 first 4 quadratic (red) Texture2Dltfloat4gt tSHLM_G0 DC amp linear (green) Texture2Dltfloat4gt tSHLM_G1 first 4 quadratic (green) Texture2Dltfloat4gt tSHLM_B0 DC amp linear (blue) Texture2Dltfloat4gt tSHLM_B1 first 4 quadratic (blue) Texture2Dltfloat3gt tSHLM_RGB2 Global parameters set by the application float2 g_vSHLMEnvironmentScale float3 g_vSHLMSunDirectionWS ========================================================================== Evaluate a SH basis functions for a given direction ========================================================================== void SHEvalDirection ( float3 vDirection out float4 vOut[3] ) float3 vDirection2 = vDirection vDirection vOut[0]x = 0282095 vOut[0]y = -0488603 vDirectiony vOut[0]z = 0488603 vDirectionz vOut[0]w = -0488603 vDirectionx vOut[1]x = 1092548 vDirectionx vDirectiony vOut[1]y = -1092548 vDirectiony vDirectionz vOut[1]z = 0315392 (30vDirection2z - 10) vOut[1]w = -1092548 vDirectionx vDirectionz vOut[2]x = 0546274 (vDirection2x - vDirection2y) Last three channels go unused vOut[2]yzw = 00 return ========================================================================== Turn world space position into light map UV ========================================================================== float2 ComputeLightMapUV ( float3 vPositionWS ) float2 vUV = (vPositionWSxz g_vSHLMEnvironmentScale) + 05 return vUV ========================================================================== Assemble the SH coefficients amp Dominant light info SH is 3rd order (9 coefficients per color channel) Coefficients are stored in an array of float4 vectors the last the components of the last float4 vector in each array go unused Inputs vUV - Texture coord sLightMapSampler - sampler state Outputs vSHr[] - Residual lighting environment (dominant light removed) vSHg[] - Residual lighting environment (dominant light removed) vSHb[] - Residual lighting environment (dominant light removed) cDominantColor - Dominant directional light color vDominantDir - Dominant directional light direction ========================================================================== void GetLightingEnvironment ( float2 vUV sampler sLightMapSampler out float4 vSHr[3] out float4 vSHg[3]
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
95|P a g e
out float4 vSHb[3] out float3 cDominantColor out float3 vDominantDir ) vSHr[0] = tSHLM_R0Sample( sLightMapSampler vUV ) DC amp linear terms vSHg[0] = tSHLM_G0Sample( sLightMapSampler vUV ) DC amp linear terms vSHb[0] = tSHLM_B0Sample( sLightMapSampler vUV ) DC amp linear terms vSHr[1] = tSHLM_R1Sample( sLightMapSampler vUV ) first 4 quadratic vSHg[1] = tSHLM_G1Sample( sLightMapSampler vUV ) first 4 quadratic vSHb[1] = tSHLM_B1Sample( sLightMapSampler vUV ) first 4 quadratic final quadratic (red green blue) float3 vTmp = tSHLM_RGB2Sample( sLightMapSampler vUV ) vSHr[2]x = vTmpr last 3 channels of vSHr[2] go unused vSHg[2]x = vTmpg last 3 channels of vSHr[2] go unused vSHb[2]x = vTmpb last 3 channels of vSHr[2] go unused extract dominant light direction from linear SH terms vDominantDir = (vSHr[0]yzw 03 + vSHg[0]yzw 059 + vSHb[0]yzw011) vDominantDir = normalize( float3(-vDominantDirzx vDominantDiry) ) turn dom direction into an SH directional light with unit intensity float4 Ld[3] SHEvalDirection( vDominantDir Ld ) Ld[0] = 295679308573 factor to make it unit intensity Ld[1] = 295679308573 Ld[2] = 295679308573 float fDenom = dot(Ld[0]Ld[0])+dot(Ld[1]Ld[1])+(Ld[2]xLd[2]x) find the color of the dominant light cDominantColorr = (dot(Ld[0]vSHr[0])+dot(Ld[1]vSHr[1])+(Ld[2]xvSHr[2]x))fDenom cDominantColorg = (dot(Ld[0]vSHg[0])+dot(Ld[1]vSHg[1])+(Ld[2]xvSHg[2]x))fDenom cDominantColorb = (dot(Ld[0]vSHb[0])+dot(Ld[1]vSHb[1])+(Ld[2]xvSHb[2]x))fDenom subtract dominant light from original lighting environment so we dont get double lighting vSHr[0] = vSHr[0] - Ld[0]cDominantColorr vSHg[0] = vSHg[0] - Ld[0]cDominantColorg vSHb[0] = vSHb[0] - Ld[0]cDominantColorb vSHr[1] = vSHr[1] - Ld[1]cDominantColorr vSHg[1] = vSHg[1] - Ld[1]cDominantColorg vSHb[1] = vSHb[1] - Ld[1]cDominantColorb vSHr[2]x = vSHr[2]x - Ld[2]xcDominantColorr vSHg[2]x = vSHg[2]x - Ld[2]xcDominantColorg vSHb[2]x = vSHb[2]x - Ld[2]xcDominantColorb ========================================================================== Compute the amount of shadow to apply fShadow comes from a shadow map lookup ========================================================================== float ComputeDirectLightingShadowFactor ( float3 vDominantLightDir float fShadow ) in order to avoid double darkening we figure out how much the dominant light matches up with the actual directional light source and then use that to figure out how much extra darkening we should apply thresholds for fading in shadow the cosine of the angle between the two vectors is mapped to the [01] range these thresholds mark the points within that threshold that the shadow is faded in Tweak these to change the range over which the shadow is faded inout static const float fShadowStart = 045 start fading at ~63 degrees static const float fShadowStop = 095 full shadow at ~18 degrees
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
96|P a g e
smoothstep to fade inout shadow Dot product is scaledbiased from [-11] into [01] range threshold terms determine where the fade inout boundaries are we call this exposure to sun because it approximates how exposed you are to the sun and thus how much shadow should be allowed float fAngle = dot(vDominantLightDirg_vSHLMSunDirectionWS)05+05 float fExposureToSun = smoothstep(fShadowStartThreshold fShadowStopThreshold fAngle) amount of dominant light to remove float fPercentShadowed = lerp( 1 fShadow fExposureToSun) return fPercentShadowed ========================================================================== Compute shadowed diffuse lighting We pass the dominant light direction and dominant light color back to the caller so that it may be used for specularglossy calculations The adjusted shadow factor is passed back in the alpha channel of the returned vector so that it may be used for shadowing any specularglossy shading terms that the caller computes ========================================================================== float4 ComputeDiffuse ( float3 vPositionWS float3 vNormalWS float fShadow out float3 vDominantLightDir out float3 cDominantLightColor ) compute a texture coord for the light map float2 vUV = ComputeLightMapUV( vPositionWS ) get the lighting environment float4 vSHLightingEnvR[3] vSHLightingEnvG[3] vSHLightingEnvB[3] GetLightingEnvironment( vUV g_sSHLMBilinear vSHLightingEnvR vSHLightingEnvG vSHLightingEnvB cDominantLightColor vDominantLightDir ) build basis for lambertian reflectance function float4 vSHLambert[3] SHEvalDirection( vNormalWS vSHLambert ) the lambertian SH convolution coefficients for the first three bands float3 vConvolution = float3( 10 2030 1040 ) vSHLambert[0] = vConvolutionxyyy vSHLambert[1] = vConvolutionzzzz vSHLambert[2]x = vConvolutionz apply shadow to the direct dominant light float fShadowFactor = ComputeDirectLightingShadowFactor( vDominantLightDir fShadow ) cDominantLightColor = fShadowFactor direct diffuse lighting (from dominant directional light) float3 cDiffuse = max( 0 dot(vNormalWSvDominantLightDir) ) cDominantLightColor diffuse light from lighting environment (dominant light removed) cDiffuser += dot( vSHLambert[0] vSHLightingEnvR[0] ) DC amp linear cDiffuseg += dot( vSHLambert[0] vSHLightingEnvG[0] ) cDiffuseb += dot( vSHLambert[0] vSHLightingEnvB[0] ) cDiffuser += dot( vSHLambert[1] vSHLightingEnvR[1] ) quadractic cDiffuseg += dot( vSHLambert[1] vSHLightingEnvG[1] ) cDiffuseb += dot( vSHLambert[1] vSHLightingEnvB[1] ) cDiffuser += vSHLambert[2]x vSHLightingEnvR[2]x cDiffuseg += vSHLambert[2]x vSHLightingEnvG[2]x cDiffuseb += vSHLambert[2]x vSHLightingEnvB[2]x cDiffuse = max( 0 cDiffuse ) return float4(cDiffuse fShadowFactor)
Listing 10 HLSL shader code implementing the spherical harmonic light map techniques described in this section
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
C
3 IcmcierWeamfeit
3 WSpaaHbedB
Chapter 3 Ma
37 Con
n this chapcharacters umethods forcharacterͲtontelligent chentirelyontrangingfromWe renderextremeclosartparallelamassivecrowfor high quaenvironmentnteractive rtriangleson
38 Ack
WewouldlikSpecificallypainstakingabsolutelycralso like to tHensley JasbleedingͲedgextremely hdisplacemenBudirijantoP
arch of the Fr
nclusion
terwe coveusing the mr computingͲcharacter characters ttheGPUThmcoarsestlethousandsseͲupstofarartificial intewdrenderinality closeͲuts and anrates (over 2averageat2
nowledg
ketothankaAbeWileyattention trucialtothethank the foon Yang angeadvancedhelpful byntmapswithPurnomofor
oblins
ered variousmassive parag dynamic pcollisions Inhe FroblinseFroblinsdevelatonly9of animatedrawayldquobirdelligencecomgwithLODups and stabadvanced20 fps onA20Ͳ25fps)w
ements
allthecreatour lead ao detail Thsuccessoftollowing foldRajaKodud features sworking togh tessellatiortheirwork
s aspects ofallelization apath findingn our large(frog goblin
democontai900polygond intelligentdrsquoseyerdquoviewmputation fmanagemenble performglobal illumATI Radeonregwhilemainta
ivehardͲwortist deservhe talentedthisprojectks fromAMuri aswellspecificallygether onon inmindonthistool
f simulatingavailable ong using gloeͲscale envins) are conns3000chansallthewat characterswsoftheentfordynamicntwithhighmance terraimination sysregHD 4870)iningtheful
orkingandoves a speciad artists froandwersquodli
MDGameCoasAMD graTimKelleya tool speandwersquodp
g and renden the latestobal modelronment wncurrently saractersrenaytoover1s from a vatiresystempathfindingendrenderin system cstem We awith staggellhighqualit
overͲallfunfalmentionom ChaingunketothankomputingGraphicsdriveandMatt Joecifically departicularly l
ering large ccommodityand local aith thousanimulated anderingatv6Mtriangleariety of vieOursystemgand localaingcapabilitcascaded share able toering polygotylightingan
olkswhocofor all his pn Studios athemforthroupDanArdeveloperohnsonThesigned forlike to than
9
crowds of ay GPUs Weavoidance fonds of highnimated anarious levelssatextremeewpoints racombinessavoidanceotiessuchashadows forrender ou
on count (6ndshadowin
ntributedtopatience diland Exigenteircontribu
AbrahamsͲGerswhohelpeGPU toolsrobust genkPeter Loh
97|P a g e
utonomouse describedor handlingly detailedd renderedsofdetailsecloseͲupsnging fromtateͲofͲtheͲon theGPUtessellationlargeͲrangeur world atndash 8million
ngsolution
othisdemoligence and were alsotionsWersquodessel Justined stabilize groupwasneration ofhrmann and
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
98|P a g e
39 References [AHH08]AKENINEͲMOumlELLERTHAINESEANDHOFFMANN2008RealͲTimeRendering3rdedAKPeters
Ltd[AMDGMM08]AMDGPUMeshMapper2008
httpdeveloperamdcomgpuMeshMapperPagesdefaultaspx[AKDS04]ANNENTKAUTZ JDURANDFANDSEIDELHP2004SphericalHarmonicGradients forMidͲ
RangeIlluminationRenderingTechniques2004EurographicsSymposiumonRendering[CHEN08]CHENH2008LightingandMaterialofHALO3GameDeveloperrsquosConference(SanFrancisco)[EVERITT01] EVERITT C 2001 Interactive OrderͲIndependent Transparency Technical report NVIDIA
Corporation[FIORINISHILLER98]FIORINIPANDSHILLERZ1998MotionPlanninginDynamicEnvironmentsUsingVelocity
ObstaclesInternationalJournalonRoboticsResearch17(7)760Ͳ772[GEE08]GEEK2008Direct3Dreg11TessellationPresentationGamefest2008SeattleWAJuly2008[GKM93] GREENE N KASS M AND MILLER G 1993 Hierarchical ZͲbuffer visibility In SIGGRAPH rsquo93
Proceedingsofthe20thannualconferenceonComputergraphicsand interactivetechniquesACMNewYorkNYUSApp231ndash238
[HARADA07]HARADAT2007RealͲTimeRigidBodySimulationonGPUsInGPUGems3NguyenHedAddisonͲWesley
[HU08]HUY2008LightmapCompressioninHALO3GameDeveloperrsquosConference(SanFrancisco)[JEONGWHITAKER07A] JEONGWͲK ANDWHITAKER RT 2007 A Fast Eikonal Equation Solver for Parallel
SystemsSIAMconferenceonComputationalScienceandEngineering2007TechnicalSketches[JEONGWHITAKER07B]JEONGWͲKANDWHITAKERRT2007AFastIterativeMethodforaClassofHamiltonͲ
JacobiEquationsonParallelSystemsUniversityofUtahSchoolofComputingTechnicalReportUUCSͲ07Ͳ010
[KLEIN08]KLEINA2008IntroductiontotheDirect3Dreg11GraphicsPipelinePresentationGamefest2008SeattleWAJuly2008
[MHR07] MILLAacuteN E HERNAacuteNDEZ B AND RUDOMIN I 2007 Large Crowds of Autonomous AnimatedCharactersUsingFragmentsShadersandLevelofDetailShaderX5AdvancedRenderingTechniquesEngelW(Editor)CharlesRiverMediaDecember2006
[SLOAN08] SLOAN P 2008 Stupid Spherical Harmonic (SH) Tricks Game Developerrsquos Conference (SanFrancisco)
[TATARCHUK08]TATARCHUKN2008AdvancedTopicsinGPUTessellationAlgorithmsandLessonsLearnedPresentationGamefest2008SeattleWAJuly2008
[TCP06]TREUILLEACOOPERSANDPOPOVIZ2006ContinuumCrowdsACMTransGraph253(Jul2006)pp1160Ͳ1168BostonMA
[TSITSIKLIS95]TSITSIKLISJN1995EfficientAlgorithmsforGloballyOptimalTrajectoriesIEEETransactionsonAutomaticControl409(Sept)1528Ͳ1538
[VBPS08]VANDENBERGJ PATILSSEWALLJMANOCHADANDLINM2008 InteractiveNavigationofMultipleAgents inCrowdedEnvironmentsInProceedingsofthe2008Symposiumon interactive3DGraphicsandGames(RedwoodCityCaliforniaFebruary15Ͳ172008)SI3D08ACMNewYorkNY139Ͳ147
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
99|P a g e
[WWS07]WANGLWANGXSLOANPWEILTONGXANDGUOB2007RenderingfromCompressedHighDynamicRangeTexturesonProgrammableGraphicsHardwareACMSymposiumonInteractive3DGraphicsandGames
[YCP08]YEHHCURTISSPATILSVANDENBERGJMANOCHADANDLINM2008CompositeAgentsIntheproceedingsofACMSIGGRAPHEurographicsSymposiumonComputerAnimation2008
[ZSXL06]ZHANGFSUNHXULANDLUNLK2006ParallelͲsplitshadowmapsforlargeͲscalevirtualenvironmentsInVRCIArsquo06Proceedingsofthe2006ACMinternationalconferenceonVirtualrealitycontinuumanditsapplicationsACMNewYorkNYUSApp311ndash318
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
AdvancesinRealͲTimeRenderingin3DGraphicsandGamesCoursendashSIGGRAPH2008NTatarchuk(Editor)
100|P a g e
AppendixA Solve for roots of quadratic equation float2 EvalQuadratic( float a float b float c ) float2 roots rootsx = (-b + sqrt( bb-4ac ))(2a) rootsy = (-b - sqrt( bb-4ac ))(2a) if( bb lt= 4ac ) roots = float2( INF-1 INF-1 ) return roots Solve for the the potential of the current position based on the potential of the neighbors and the cost of moving here from there Refer to Jeong A Fast Eikonal Equation Solver for Parallel Systems 2007 float QuadraticSolver( float fPhiMx float fPhiMy float fCostMx float fCostMy ) float a = fPhiMx float b = fPhiMy float c = fCostMx float d = fCostMy float a1 = c c + d d float b1 = -(2 a d d + 2 b c c) float c1 = a a d d + b b c c ndash c c d d float2 roots = EvalQuadratic( a1 b1 c1 ) float fTmp = max( rootsx rootsy ) return fTmp float EvaluateFiniteDifference( float fPhi float fCost float4 vPhi float4 vCost ) float fPhiX fPhiY fCostX fCostY float fPhiN = vPhi[0] fPhiS = vPhi[1] fPhiW = vPhi[2] fPhiE = vPhi[3] float fCostN = vCost[0] fCostS = vCost[1] fCostW = vCost[2] fCostE = vCost[3] ====Calculate upwind direction for X==== if( fPhiW lt INF || fPhiE lt INF ) Figure out if west or east are cheaper if( fPhiW + fCostW lt= fPhiE + fCostE ) fPhiX = fPhiW fCostX = fCostW else fPhiX = fPhiE fCostX = fCostE
Listing 11 HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver
Chapter 3 March of the Froblins
101|P a g e
====Calculate upwind direction for Y==== if( fPhiN lt INF || fPhiS lt INF ) bInvalidY = false Figure out if north or south are cheaper if( fPhiN + fCostN lt= fPhiS + fCostS ) fPhiY = fPhiN fCostY = fCostN else fPhiY = fPhiS fCostY = fCostS Save for new potential in this location by solving quadratic float result = 0 result = QuadraticSolver( fPhiX fPhiY fCostX fCostY ) result = min( min( fPhiY + fCostY fPhiX + fCostX ) result ) Potential should only be decreasing result = ( result gt fPhi ) fPhi result float4 EikonalSolverIteration() float4 vCurPhi = tPhiMapSampleLevel( sPhiPoint vvUV 0 ) float4 vCurCost = tCostMapSampleLevel( sCostPoint vvUV 0 ) Fetch potential values Fetches out of domain = INF float4 vPhiN = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0-1 ) ) float4 vPhiS = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 0 1 ) ) float4 vPhiW = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2(-1 0 ) ) float4 vPhiE = tPhiMapSampleLevel( sPhiPoint vvUV 0 int2( 1 0 ) ) Fetch potential values Fetches out of domain = 10000 float4 vCostN = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0-1 ) float4 vCostS = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 0 1 ) float4 vCostW = tCostMapSampleLevel( sCostPoint vvUV 0 int2(-1 0 ) float4 vCostE = tCostMapSampleLevel( sCostPoint vvUV 0 int2( 1 0 ) float4 vPhi [unroll] for( int i = 0 i lt 4 i++ ) vPhi[i] = EvaluateFiniteDifference( vCurPhi[i] vCurCost[i] float4( vPhiN[i] vPhiS[i] vPhiW[i] vPhiE[i] ) float4( vCostN[i] vCostS[i] vCostW[i] vCostE[i] ) ) return vPhi
Listing 11 (cont) HLSL code for iterative eikonal solver