arXiv:2105.12105v1 [astro-ph.CO] 25 May 2021

DRAFT VERSION MAY 26, 2021Typeset using LATEX twocolumn style in AASTeX62

ADDGALS: Simulated Sky Catalogs for Wide Field Galaxy Surveys

RISA H. WECHSLER,1, 2, 3 JOSEPH DEROSE,4 MICHAEL T. BUSHA,2, 5 MATTHEW R. BECKER,6 ELI RYKOFF,2, 3 AND AUGUST EVRARD7

1Department of Physics, Stanford University, 382 Via Pueblo Mall, Stanford, CA 94305, USA2Kavli Institute for Particle Astrophysics & Cosmology, P. O. Box 2450, Stanford University, Stanford, CA 94305, USA

3SLAC National Accelerator Laboratory, Menlo Park, CA 94025, USA4Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 93720, USA

5Present Address: Securiti6High-Energy Physics Division, Argonne National Laboratory, Lemont, IL 60439, USA

7Departments of Physics and Astronomy, University of Michigan, Ann Arbor, MI

ABSTRACT

We present a method for creating simulated galaxy catalogs with realistic galaxy luminosities, broad-band col-ors, and projected clustering over large cosmic volumes. The technique, denoted ADDGALS (Adding DensityDependent GAlaxies to Lightcone Simulations), uses an empirical approach to place galaxies within lightconeoutputs of cosmological simulations. It can be applied to significantly lower-resolution simulations than thoserequired for commonly used methods such as halo occupation distributions, subhalo abundance matching, andsemi-analytic models, while still accurately reproducing projected galaxy clustering statistics down to scales ofr ∼ 100h−1kpc . We show that ADDGALS catalogs reproduce several statistical properties of the galaxy dis-tribution as measured by the Sloan Digital Sky Survey (SDSS) main galaxy sample, including galaxy numberdensities, observed magnitude and color distributions, as well as luminosity- and color-dependent clustering. Wealso compare to cluster–galaxy cross correlations, where we find significant discrepancies with measurementsfrom SDSS that are likely linked to artificial subhalo disruption in the simulations. Applications of this modelto simulations of deep wide-area photometric surveys, including modeling weak-lensing statistics, photometricredshifts, and galaxy cluster finding are presented in DeRose et al. (2019a), and an application to a full cosmol-ogy analysis of Dark Energy Survey (DES) Year 3 like data is presented in DeRose et al. (2021a). We plan topublicly release a 10,313 square degree catalog constructed using ADDGALS with magnitudes appropriate forseveral existing and planned surveys, including SDSS, DES, VISTA, WISE, and LSST.

Keywords: cosmology:theory — galaxies:halos — galaxies:evolution — large-scale structure of the universe —dark matter — simulations

1. INTRODUCTION

Cosmology and the study of galaxy formation are undergo-ing a renaissance driven by exponential increases in comput-ing power, the public availability of large amounts of high-quality sky survey data, and continued investment in ever-more sensitive instrumentation. These trends place stringentdemands on the accuracy of the theoretical models used toanalyze such survey data.

The best current models of the Universe posit that hier-archical structure formation via the gravitational collapse ofcold dark matter drives the formation and evolution of galax-ies. Simultaneously, galaxies serve as a rich set of tracers ofthe cosmic density and velocity fields, imparting the galaxydistribution with sensitivity to fundamental physics like cos-mic acceleration, modifications to General Relativity, mas-sive neutrinos, and the micro-physical nature of dark matter

(e.g., Weinberg et al. 2013). This rich discovery potentialdemands precise connection between the galaxy and mat-ter distributions, particularly on small cosmic scales whichhold immense statistical information. The galaxy–dark mat-ter connection is a key source of theoretical uncertainty ingalaxy survey analyses concerned with constraints on funda-mental physics.

The context above is driving two important trends incosmology. First, researchers are developing a wealth ofmethods that aim to infer the connection between galaxiesand their dark matter halos (see Wechsler & Tinker 2018for a review). These studies usually employ large-volume,high-resolution N-body simulations of structure formation.Second, cosmologists now routinely employ "synthetic" or"mock" catalogs of galaxies to support analyses of surveydata. These synthetic catalogs are constructed with a wide

arX

iv:2

105.

1210

5v1

[as

tro-

ph.C

O]

25

May

202

1

http://orcid.org/0000-0003-2229-011X

http://orcid.org/0000-0002-0728-0960

http://orcid.org/0000-0001-7774-2246

2 WECHSLER, DEROSE ET AL.

variety of techniques that draw from advances in understand-ing the connection between galaxies and halos. Publishedexamples focusing on modeling galaxy populations in real-istic survey lightcones include approaches using halo occu-pation distributions (HOD) (Yan et al. 2004; Manera et al.2013; Sousbie et al. 2008; Fosalba et al. 2015; Crocce et al.2015; Smith et al. 2017; Harnois-Déraps et al. 2018; Steinet al. 2020), semi-analytic models (SAM) (Eke et al. 2004;Cai et al. 2009; Merson et al. 2013; Somerville et al. 2021),subhalo abundance matching (SHAM) (Gerke et al. 2013;Safonova et al. 2021) or a combinations of the above (Kory-tov et al. 2019) to accomplish this task.

In this work, we present ADDGALS (Adding Density-Determined Galaxies to Lightcone Simulations), a compu-tationally inexpensive, but high-fidelity approach for con-structing synthetic galaxy catalogs from lightcone simula-tions, designed to support the analysis of large-area galaxysurvey data. Unlike HOD, SAM, and SHAM approaches,it is designed specifically to populate modest resolution N-body simulations with galaxies that have realistic luminosi-ties, spectral energy distributions (SEDs), and clustering. Ex-cept for a few percent of galaxies occupying the most mas-sive halos, ADDGALS is not sensitive to these N-body simu-lations’ lack of convergence at the smallest scales. The mod-est expense of these simulations enables the creation of largenumbers of large-volume realizations of the Universe, whichare often required by modern survey analyses.

With its modest computational requirements, ADDGALS

can be used to bring a new level of realism to survey analysistasks that require a statistical sample of synthetic catalogs.Examples of these tasks include generating covariance ma-trices or testing the robustness of these analyses to key sys-tematic effects through direct, end-to-end tests where the trueanswer is known. MacCrann et al. (2018), and DeRose et al.(2021b) present key examples of the latter approach. Theseworks combine the Dark Energy Survey (DES) year one (Y1)and year three (Y3) analysis pipelines with 18 synthetic cat-alogs produced with the methodology presented in this workto perform end-to-end tests of a 3×2-point weak lensing andgalaxy clustering analysis; To et al. (2021) performed a simi-lar analysis that combined these statistics with cluster countsand cluster–galaxy cross correlations. Due to the realism ofthe ADDGALS catalogs, they were able to use the same anal-ysis pipeline as was applied to the DES data. These tests de-pended critically on simulated catalogs that jointly modeledseveral effects, including various observables (e.g. galaxyclustering, galaxy-galaxy lensing, cosmic shear, and clustercounts), photometric redshifts, and survey effects like vary-ing depth maps. Further, tens of realizations were needed todemonstrate that the recovered parameters were accurate towell below the sensitivity of the DES measurements them-selves. These kinds of tests will be increasingly important

as surveys begin to produce stringent constraints on funda-mental physics, motivating an approach that is able to modellarge volumes and multiple observables with modest compu-tational cost.

As emphasized by Wechsler & Tinker (2018), currentlyused approaches for modeling galaxies within large-scalestructure face trade offs between the fidelity of the modeledproperties, the resolution or computational requirements ofthe method, and the degree to which the model is physics-driven or empirically data-driven. For example, the subhaloabundance matching approach (SHAM), which assumes thatall galaxies are placed on resolved halos and subhalos, hasbeen shown to faithfully reproduce the spatial distributionof galaxies in the local Universe where it can best be mea-sured (Kravtsov et al. 2004; Reddick et al. 2013; Chaves-Montero et al. 2016; Lehmann et al. 2017; Contreras et al.2020; DeRose et al. 2021a), as well as the evolution of thegalaxy population with time (Conroy et al. 2006a; Mosteret al. 2011; Behroozi et al. 2013a), with a very small numberof parameters. This technique has the advantage of includingseveral important correlations between halo history, galaxypopulations, and environment that are neglected by someother methods, but has stringent resolution requirements.

One of the most commonly used methods, populating sim-ulations with galaxies using a halo occupation distribution(HOD; e.g. Jing 1998; Seljak 2000; Bullock et al. 2002; Yanget al. 2003a; Berlind & Weinberg 2002; Zheng et al. 2005;Mandelbaum et al. 2006; van den Bosch et al. 2007; Zehaviet al. 2011a; Zu & Mandelbaum 2015), places each galaxy inregions within resolved host halos, irrespective of dark mattersubstructures. This reduces the computational requirementson the simulations compared with methods that trace halohistories, but generally requires more parameters than abun-dance matching and may be missing relevant correlations be-tween galaxy populations and halo history. The conditionalluminosity function method (e.g. Yang et al. 2003b; Cooray2006) has similar requirements. The computational expenseof HOD modeling can be further decreased by employingapproximate, or low-resolution methods for generating halocatalogs (e.g. Bond & Myers 1996; Scoccimarro & Sheth2002; Kitaura et al. 2016; Chuang et al. 2015; Monaco et al.2013; Tassev et al. 2013; White et al. 2014; Avila et al. 2015;Feng et al. 2019; Izard et al. 2018; Balaguera-Antolínez et al.2019).

SAMs (White & Frenk 1991; Kauffmann et al. 1993;Somerville & Primack 1999; Cole et al. 2000; Benson et al.2002; Bower et al. 2006; Benson 2012; Guo et al. 2013; Cro-ton et al. 2016) generally require a degree of resolution be-tween abundance matching and the HOD — the former ismore relevant if one wants to trace the histories of all galaxiesproperly and if one wants to keep every galaxy on a resolvedsubstructure. This can be reduced to the less demanding re-

ADDGALS SYNTHETIC SKY SURVEYS 3

quirements of the HOD if semi-analytic methods are used totrack halo histories and the kinematics of satellite galaxies inlarger systems (Benson 2012; Jiang & van den Bosch 2016;Yang et al. 2020; Jiang et al. 2021).

Concurrent with the development and use of the methoddescribed here, significant progress has been made in otherdata-driven approaches, particularly in empirical models thatuse information from halo histories. These approaches in-clude extensions to the abundance matching approach likeconditional abundance matching, which associates color orstar formation rates with secondary halo properties (Masakiet al. 2013; Hearin & Watson 2013; Hearin et al. 2014a; Ya-mamoto et al. 2015; Saito et al. 2016; Contreras et al. 2020)or that trace galaxy histories through the histories of theirhalos (e.g. Becker 2015; Moster et al. 2018; Behroozi et al.2019) and constrain their properties with a wide range ofdata.

Finally, full-physics cosmological hydrodynamics meth-ods (see Vogelsberger et al. 2020, for a recent review) aremaking steady progress in describing the galaxy–halo con-nection. A recent verification of three independent simula-tions examines the satellite galaxy occupation conditionedon total halo mass and redshift, finding a consistent formfor the probability density function along with slightly super-Poisson dispersion, but with mean counts varying by tens ofpercent (Anbajagane et al. 2020). These simulations are com-putationally expensive and thus challenging to use to modellarge survey volumes, but they can be used to inform HODapproaches and test SAM or empirical methods, and provideessential input into possible modification of the dark matterdistribution from baryonic processes.

ADDGALS’ combination of realism and relatively lowcomputational expense owes to a machine-learning styleapproach that uses higher-resolution N-body simulations totrain the galaxy–dark matter connection scales and data totrain a physically motivated model for the dependence ofgalaxy properties on local density. This approach is similarin spirit to other recent work that employs statistical learningtechniques to connect the dark matter distribution to the dis-tribution of biased tracers (Modi et al. 2018; Berger & Stein2019; Ramanah et al. 2019; Zhang et al. 2019; Tröster et al.2019; Dai & Seljak 2021). The features that ADDGALS usesare chosen to be relatively insensitive to resolution effects inthe simulations, while still encapsulating quantities relevantto the physics of galaxy formation.

A flowchart with the key steps of the algorithm is givenin fig. 1. The ADDGALS algorithm can be divided into twomain parts, the assignment of luminosities and the assign-ment of SEDs. In the first part, we fit a model to the distri-bution of galaxy absolute magnitude at fixed local overden-sity using a high-fidelity model of the galaxy–halo connec-tion. In this work, we use a SHAM model applied to high-

resolution structure formation simulations, but this choice isnot essential. We then use these distributions to populate alow-resolution simulation via Monte Carlo. This process isillustrated in steps 1 through 3 in the flowchart. In the secondpart of the ADDGALS algorithm, we use a conditional abun-dance matching model fit to the Sloan Digital Sky Survey(SDSS) in DeRose et al. (2021a) to assign an SED to eachgalaxy. Finally, we apply observational effects to producethe complete catalog. These parts correspond to steps 4 and5 in the flowchart.

We demonstrate that these steps are able to reproduce theabsolute magnitude dependent two-point clustering and halooccupation properties of the SHAM catalog. Further, weshow that they reproduce a number of additional observedproperties of SDSS galaxies, including their color distribu-tions at a given absolute magnitude and the qualitative trendsof the observed color-dependent clustering.

The model that is perhaps most similar to the one presentedin this work is GALSAMPLER (Hearin et al. 2020), in thatit places galaxies from a high-fidelity model of galaxy for-mation run on high-resolution simulations into the halos oflower resolution simulations. The main distinguishing factoris the use of halo mass as the conditional variable in GAL-SAMPLER, whereas ADDGALS uses local Lagrangian den-sity, which can be measured in significantly lower-resolutionsimulations.

In this work, ADDGALS is trained on a SHAM model,but the machine-learning style approach taken by ADDGALS

generalizes to training on other models of the galaxy-haloconnection, including hydrodynamical simulations, SAMs,or empirical models that trace halo histories. Note that itis likely that secondary properties of the density field willbe needed for these generalizations in analogy to secondaryhalo properties and assembly bias. This flexibility combinedwith modest computational requirements will enable the pro-duction of suites of synthetic catalogs with different under-lying models for the galaxy–halo connection. These suitescan then be used to test the robustness of cosmological con-straints from surveys to underlying assumptions about galaxyformation.

ADDGALS has been in use for some time to facilitate a va-riety of applications of large-scale sky survey data, with aparticular focus on wide-area photometric surveys. A pre-liminary description of the work was presented in Wechsler(2004). Subsequent work using these catalogs has made useof earlier versions than those described here; in most casesthe important details were described in those papers. Becauseof the ability of these techniques to accurately model large,wide surveys including realistic photometry and lensing, therange of applications has been broad.

Catalogs produced with this method have been used exten-sively in the testing, systematics assessment, and co-analysis


0. Abundance match galaxy luminosities onto subhalos in high-resolution simulation.

Appendix A

1. Measure and fit ! and ! .

Section 3.3

p(Rδ |Mr) p(Mr |Mvir)

2. Populate lightcone using ! and ! .

Section 3.1-3.2

p(Rδ |Mr) p(Mr |Mvir)

3. Conditional abundance match SEDs onto galaxies.

Section 5

5. Apply observational effects (e.g. photometric errors, masks, weak lensing, etc.).

Section 6

Observed Clustering

rp [h�1Mpc]

wp(r

p)

rp [h�1Mpc]

wp(r

p)

Luminosity Function

SED Training Set

R�

p(R�)

Mr

�(M

r)

Observational Input Algorithm

Wavelength

SE

DTem

pla

te

rp [h�1Mpc]

r pw

p(r

p)

Figure 1. Flowchart of the ADDGALS algorithm. Observational inputs are listed in the left hand column. In the first step, we use observedclustering and luminosity functions to constrain a SHAM model, applied to a simulation with resolved substructures (appendix C). In the secondstep, we measure and fit a model for central galaxies given halo mass (section 3.1) and for the dark matter density Rδ given luminosity for allother galaxies (section 3.3). In the third step, we populate a lightcone using this algorithm. In the fourth step, we use an observed galaxysample with luminosities, and SED properties to conditional abundance match SEDs onto simulated galaxies (section 5). Finally, we applyobservational effects (section 6).


of galaxy cluster catalogs and results with the MAXBCG andREDMAPPER algorithms (Koester et al. 2007b,a; Johnstonet al. 2007; Rozo et al. 2007a,b; Becker et al. 2007; Sheldonet al. 2009; Hansen et al. 2009a; Tinker et al. 2012; Dietrichet al. 2014; Farahi et al. 2016; Varga et al. 2019a; Abbott et al.2020a; To et al. 2021; Myles et al. 2020), for the improve-ment and testing of other galaxy cluster finders (Miller et al.2005; Dong et al. 2008; Hao et al. 2010; Soares-Santos et al.2011; Bleem et al. 2015), for the development and testing of anumber of photometric redshift algorithms, especially in thecontext of DES (Gerdes et al. 2010; Cunha et al. 2012, 2014;Bonnett et al. 2016; Leistedt et al. 2016; Hoyle et al. 2018;Gatti et al. 2018a; Cawthon et al. 2018; Buchs et al. 2019;Myles et al. 2021; Gatti et al. 2020; Cawthon et al. 2020)and LSST (Malz et al. 2018; Schmidt et al. 2020), for the de-velopment of various survey analysis approaches using weaklensing shear (VanderPlas et al. 2012; Chang & Jain 2014;Szepietowski et al. 2014; Becker et al. 2016; Troxel et al.2018; Friedrich et al. 2018a; Chang et al. 2018a; Bradshaw2019), for the development and systematics testing of vari-ous galaxy and cluster cross correlations (High et al. 2012;Bleem et al. 2012; Shin et al. 2019; Pandey et al. 2019) andother statistics of the galaxy and lensing spatial distribution(Friedrich et al. 2018b; Gruen et al. 2018a), and for earlypreparation and testing of the science prospects of the DES(Gill et al. 2009; Davies et al. 2013; Chang et al. 2015; Parket al. 2016; Asorey et al. 2016), the WFIRST survey (Martenset al. 2019; Massara et al. 2020), and the LSST survey (Maoet al. 2018) as well as spectroscopic surveys (Saunders et al.2014; Nord et al. 2016).

In DeRose et al. (2019a), we described the use of AD-DGALS to create a suite of synthetic catalogs for the DES,extending the work described here to higher redshift whileincluding a number of additional observational effects andpresenting tests of a number of additional observables, in-cluding those related to cosmic shear, photometric redshifts,high redshift galaxy clustering and lensing, and photometriccluster finding. That suite of catalogs as well as earlier ver-sions were used extensively in the analysis of DES ScienceVerification and Y1 data (Abbott et al. 2018; Krause et al.2017; MacCrann et al. 2018; Gruen et al. 2018b; Sánchezet al. 2017; Clampitt et al. 2017; Davis et al. 2018; Cawthonet al. 2018; Varga et al. 2019b; Gatti et al. 2018b; Chang et al.2018b; Abbott et al. 2020b). Catalogs produced using AD-DGALS have continued to be used to facilitate DES Y3 cos-mology analyses (Buchs et al. 2019; Myles et al. 2021; Gattiet al. 2020; Cawthon et al. 2020), and a description of thecatalogs used for that work is given in DeRose et al. (2021b).

This paper proceeds as follows. In section 2, we describethe simulations used in this work. In section 3 our methodof populating simulations with galaxies in a single-band rest-frame absolute magnitude is described. Tests of this part of

the algorithm are presented in section 4. In section 5, we out-line our method for assigning spectral energy distributionsto simulated galaxies. Tests of this method are presented insection 7. In section 8 we discuss the resolution requirementsfor ADDGALS. Finally, we conclude in section 9 with a dis-cussion of the strengths and limitations of the algorithm andfuture directions of research. Throughout this manuscript,we quote magnitudes using the AB system and h = 1.0 units.

2. N-BODY SIMULATIONS

All simulations in this work were run using the code L-GADGET2 (Springel et al. 2005), a proprietary version ofGADGET-2 optimized for memory efficiency and explicitlydesigned to run large-volume, dark matter-only N-body sim-ulations. We have modified this code to create a particlelightcone output on the fly (see DeRose et al. 2019a, fordetails). Initial conditions were generated with the second-order Lagrangian perturbation theory code 2LPTIC (Crocceet al. 2006) using linear power spectra computed with theCAMB code (Lewis 2004). Early versions of these simula-tions were generated on XSEDE supercomputers using theApache Airavata1 workflow management framework (Erick-son et al. 2012).

We use four N-body simulations with volumes of (400h−1Mpc )3,(1.05h−1Gpc )3, (2.6h−1Gpc )3, and (4.0h−1Gpc )3; the sim-ulation parameters are summarized in table 1. The first ofthese, deemed T1 (Training Simulation 1), requires suf-ficient resolution that a SHAM approach can reasonablymodel the galaxy distribution down to roughly Mr = −19(see e.g.Reddick et al. 2013; Lehmann et al. 2017) Thissimulation has Lbox = 400 h−1Mpc and 20483 particles. Atthis resolution, the SHAM catalog is not strictly completedown to Mr = −19, as subhalos that would host galaxies withMr < −19 near the cores of massive hosts become strippedand are no longer trackable by the halo finder as they havetoo few particles (see Reddick et al. 2013 for a detailed dis-cussion). However, comparisons with SDSS data show thatthe resolution is sufficient to model the observed two-pointfunction within current observational constraints down toMr = −19, except on the very smallest scales for dimmestgalaxies in this sample. It also does reasonably well forgalaxies fainter than this limit (see DeRose et al. 2021a, forfurther discussion). The inability to accurately model small-scale clustering of the faintest samples owes in large partto subhalo disruption in T1 simulation. This is discussed atgreater length in section 8, where we compare with the C250simulation, run with the same settings as the T1 simulation,but in a volume of (250h−1Mpc)3, using 25603 particles, anda force softening of ε = 0.8h−1kpc . A lightcone output isnot necessary for the T1 simulation, but merger trees are

1 https://airavata.apache.org/


required to construct the abundance matching catalog. Wesave 100 simulation snapshots logarithmically spaced fromz = 12 to z = 0, which allows for the construction of accuratemerger trees.

For the three larger simulations, L1, L2, and L3 (Light-cone Simulations 1–3), ten snapshots at redshifts

z = {0.0,0.10,0.25,0.4,0.5,0.7,0.85,1,2,3}

are produced, as well as lightcones with areas of 10,313square degrees each (one quarter of the sky). The L1 sim-ulation is used to produce the simulated galaxy catalog thatwe compare to SDSS in section 7, while L2 and L3 are usedsolely for the resolution tests presented in section 8 and forhigh-redshift lightcone construction, as described in DeRoseet al. (2019a) and DeRose et al. (2021b). These simulationswere run as part of the multi-resolution "Chinchilla" Simu-lation suite; the higher-resolution simulations were first usedin Mao et al. (2015) and Lehmann et al. (2017). When pre-sented as "observed" catalogs, these simulations have beenreferred to as the "Buzzard" simulations.

2.1. Halo Finding

We identify halos with the publicly available adaptivephase-space halo finder ROCKSTAR2 (Behroozi et al. 2013b).ROCKSTAR is highly efficient, and has excellent accuracy(see for example, the halo finder comparison in Knebe et al.2011). It is particularly robust in galaxy mergers, importantfor the massive end of the halo mass function, and in trackingsubstructure, important for the abundance matching proce-dure applied to T1. We use Mvir strict spherical overdensity(SO) masses (Bryan & Norman 1998) here; additional halomass definitions are output by ROCKSTAR using these halocenters. ROCKSTAR also outputs several other halo proper-ties, including concentration, shape, and angular momentum(see Behroozi et al. 2013b, for details).

2.2. Merger Trees

For the highest resolution T1 simulation, we track the for-mation of halos using 100 saved snapshots between z = 12and z = 0, equally spaced in lna. The gravitationally con-sistent merger tree algorithm3 described in Behroozi et al.(2013c) is applied to construct halo merger trees. This al-gorithm explicitly checks for consistency in the gravitationalevolution of dark matter halos between time steps, and leadsto robust tracking. Details of the implementation and robust-ness tests can be found in Behroozi et al. (2013c). Using theresulting merger trees, we are able to track the peak virialmass, Mpeak, and velocity, vmax value for each identified sub-halo.

2 https://bitbucket.org/gfcstanford/rockstar/3 https://bitbucket.org/pbehroozi/consistent-trees

2.3. Lagrangian Density Estimation

The final post-processing step for the dark matter simula-tions before we can run ADDGALS is to calculate the dis-tance to the n-th nearest particle for both identified halosand all simulation particles. ADDGALS uses the relationP(Rδ|Mr,z), where Rδ is the distance to n-th nearest parti-cle, and n is the number of particles whose mass sums toMδ = 1.8×1013h−1M� . We measure this Lagrangian density,Rδ , for every particle and halo in the each of the simulationspresented in this work.

3. CONNECTING GALAXIES TO THE MATTERDISTRIBUTION

We start by describing the first part of the ADDGALS al-gorithm, which populates a dark-matter-only simulation withgalaxies using a model trained on a higher-resolution sim-ulated galaxy catalog. The algorithm is designed to workon the matter distribution from either a simulation snapshotor lightcone output. A key strength of the algorithm is itsability to use relatively low-resolution dark matter simula-tions. Consequently, we operate directly on the dark matterparticle distribution, using the density information describedin section 2.3 to assign galaxy properties. The algorithm isdesigned to insert galaxies with single-band absolute mag-nitudes. While this quantity could be chosen to be any-thing that is reasonably well-correlated with density, in thepresent work we use the SDSS r-band magnitude k-correctedto z = 0.1, M0.1

r , hereafter Mr.Here we train ADDGALS to reproduce the galaxy–dark

matter connection in an abundance matching (SHAM)model. We use the best-fit model from Lehmann et al. (2017)to assign galaxies to dark matter halos. This procedure, in-cluding the luminosity function and implementation details,are described in appendix A. In principle, the same type ofmapping can be tuned to other catalogs, such those con-structed with SAMs, hydrodynamical simulations, or otherempirical models. ADDGALS is able to approximate catalogsproduced by the SHAM model because local density mea-surements contain information about halo mass and halo-centric distance, allowing for the accurate reproduction ofthe HOD and radial profiles of the catalog that ADDGALS istuned to. Limitations of the SHAM catalog that ADDGALS istuned to, such as the effects of artificial subhalo disruption onthe satellite populations of massive halos, are also inherited.

Broadly, this part of the ADDGALS algorithm proceeds intwo steps, described in the following subsections:

1. Central galaxies are placed on all resolved host halosabove the some minimum halo mass threshold, Mmin,as listed in table 1 (section 3.1).

2. The one-dimensional probability density function(PDF) of local dark matter density around galaxies


Table 1. Description of simulations used for training and lightcone construction. In this work L2 is mainlyused for resolution tests, and L3 is only used for high redshift lightcone construction. Columns describethe simulation name, the minimum and maximum redshifts spanned by that simulation (zmin and zmax), theperiodic box size used to generate the lightcones (Lbox), the number of particles used in each simulation andthe particle mass (Npart and mpart), as well as the force softening length (εPlummer) and minimum halo massthat central galaxies are populated in (Mhalo,min).

Name zmin zmax Lbox Npart mpart εPlummer Mhalo,min

T1 training only training only 400 h−1Mpc 20483 4.8 ×108h−1M� 5.5 h−1kpc –L1 0.0 0.32 1.05 h−1Gpc 14003 3.3×1010h−1M� 20 h−1kpc 6×1012h−1M�L2 0.32 0.84 2.6 h−1Gpc 20483 1.6×1011h−1M� 35 h−1kpc 6×1012h−1M�L3 0.84 2.35 4.0 h−1Gpc 20483 5.9×1011h−1M� 53 h−1kpc 1013h−1M�

conditioned on absolute magnitude measured from theSHAM model is used to assign the rest of the galaxies(section 3.2).

3.1. Populating Resolved Central Galaxies

A statistical relationship between halo mass its primarygalaxy’s absolute magnitude, p(Mr,cen|Mvir), is assumed inorder to populate central galaxies. The mean of this distri-bution is given by

〈Mr,cen〉(Mvir) = Mr,0 − 2.5[a logx − (1/k) log(1 + xbk)], (1)

where x = Mvir/Mc and a,b,k,Mc, and Mr,0 are redshift-dependent fitting parameters. This form was proposed byVale & Ostriker (2006) to match early SHAM catalogs andhas been shown to provide a good fit to observational cata-logs (Hansen et al. 2009a; Zheng et al. 2007). A Gaussianscatter in absolute magnitude at fixed mass is assumed suchthat a halo with mass Mvir is assigned a magnitude drawnfrom

p(Mr,cen|Mvir) =N (〈Mr,cen〉(Mvir),σMr,cen ) (2)

where σMr,cen = 0.425, matching the scatter assumed in theSHAM model. Tests have shown that this relation must beapplied at least for all host halos more massive than Mvir ∼1013 h−1M� to accurately reproduce the projected clusteringof luminosity-selected galaxies (see section 8 for further dis-cussion of resolution requirements).

Equation (1) is then fit to the SHAM catalog in eachtime snapshot over the mass range 1012 h−1M� ≤ Mvir <

1015 h−1M� . When populating lightcone simulations, the fitfrom the snapshot that is closest to the redshift under con-sideration is used. Evolution in this relation over the redshiftranges between snapshots is negligible. Validation of thisrelation is described in appendix D. Once p(Mr,cen|Mvir) hasbeen determined, it is used to populate all central galaxiesdown to the halo mass limits in table 1 by sampling fromthe distribution in eq. (2), conditioned on the mass of eachhost halo in the simulation. We note that the resolution of L1

would enable us to go to lower masses, but this is not requiredto match the clustering properties of the higher-resolutionsimulation (section 8), so we keep this limit constant betweenL1 and L2 for continuity.

3.2. Populating Galaxies in Unresolved Structures

The resolution of the lightcone simulations used here issuch that central galaxies assigned using the method de-scribed above constitute only a small fraction of all galaxiesthat would be observed by deep photometric surveys. To pop-ulate the rest of the galaxies, the relationship between large-scale dark matter density and galaxy rest-frame magnitude,p(Rδ|Mr,z), is used. Rδ is defined as the radius enclosing amass scale of M = 1.8×1013 h−1M� , characterizing the localdark matter density around galaxies. For the L1 simulation,this radius corresponds to the distance to the 538th nearestdark matter particle. Section 3.3 describes how this rela-tion is determined from a SHAM catalog. This mass scaleis roughly equivalent to M∗, the typical collapsing halo mass,at z = 0 for this cosmology, and thus effectively distinguishesbetween halos of different biases.

In order to parallelize the ADDGALS algorithm, we di-vide the lightcone simulations into domains of approximately(200h−1Mpc )3 in volume. This is accomplished by divid-ing the sky in angle as well as redshift. In a given patchwith a redshift range of zlow < z < zhigh, we create a catalogof galaxies with magnitudes and redshifts {Mr,i,zi}, wherei = 1, . . . ,N, and N is the total number of galaxies, given by:

N =∫ zhigh

zlow

dzdVdz

∫ Mr,min(z)

−∞φunres(Mr,z), (3)

where Mr,min is the faintest absolute magnitude that we popu-late galaxies to, typically chosen to yield a catalog completeto a particular observed r-band magnitude limit, and φunres isthe luminosity function of all objects to be placed on unre-solved structures in the simulation. This function subtractscentral galaxies from the total luminosity function in order to


avoid double-populating bright galaxies, and thus is specifiedby

φunres(Mr,z) =φ(Mr,z) −φres(Mr,z)

=φ(Mr,z)

−

∫ ∞

Mmin

dMvir p(Mr,cen|Mvir,z)n(Mvir,z), (4)

where n(Mvir,z) is the halo mass function in the simulations.We modulate the normalization of the luminosity function bythe local dark matter overdensity,

φi,local = φi(1 + δ). (5)

Here φi,local is the normalization of the luminosity func-tion in the local domain i and δ is the matter overdensitywithin the domain. This avoids fixing the number densityof galaxies on the scale of the domain size, but can inducea scale-dependent bias on scales of similar size to the do-mains. We can see the reason for this as follows. Note thatb(r) = δg/δm, where the overdensities to are smoothed on aradius r. Equation (5) enforces δg(r > rdomain) = δm(rdomain),where δm(rdomain) is the matter overdensity on the scale of thedomain size. Thus, b(r > rdomain) = δm(rdomain)/δm(r), and isscale dependent for r > rdomain. The domains are taken to beat least (200h−1Mpc )3 and thus the effect of this choice isnegligible for most applications. However, this choice mayhave an impact on covariance matrices involving scales largerthan the domain size, which requires further investigation.The size of these domains is chosen as a compromise be-tween this effect and the run time of the algorithm, as AD-DGALS can be run in parallel over each domain.

Redshifts of each galaxy, zi, are then drawn from

P(z) =1N

dVdz

∫ Mr,min

−∞dMrφunres(Mr,z), (6)

and magnitudes, Mr,i, are drawn from φunres(Mr,zi). Withmagnitudes and redshifts assigned to every galaxy, densities{Rδ,i} are drawn from p(Rδ|Mr,i,zi), and each galaxy is thenassigned to a particle, going from brightest to faintest, withthe closest match in redshift and density that has not alreadybeen assigned. The details of this process are described inappendix C.

At this point, the described algorithm produces a catalogof galaxies with r-band absolute magnitudes. This algo-rithm is a very efficient way to generate a large-volume syn-thetic catalog with faint galaxies using primarily simulationswith modest resolution. Additionally, as long as abundancematching in the r-band works well at high redshifts, we ex-pect that the galaxy distribution should match the clusteringat a wide range of redshifts. Note that the same algorithmcan also be used to populate comoving snapshots by fixing zin the above equations to the redshift of the snapshot output.

The next section describes how p(Rδ|Mr,i,zi) is determinedfrom a SHAM catalog on a high-resolution simulation.

As we show in section 4 and appendix D, with this algo-rithm we can create a galaxy catalog that matches the pro-jected galaxy two-point function, halo occupation distribu-tion, conditional luminosity function, and galaxy profiles inhalos of a galaxy catalog populated using SHAM in a higher-resolution simulation.

3.3. Determining the p(Rδ|Mr,z) relation

The form of p(Rδ|Mr,z) is the crux of the ADDGALS algo-rithm. We have found the following bi-modal form is a goodfit to our simulations,

p(Rδ|Mr < x,z) =p(Rδ;Θ(x,z)) (7)

=(1 − p)e−(ln(Rδ)−µc)2/2σ2c /Rδ

√2πσc

+ pe(Rδ−µ f )2/2σ2f /√

2πσ f .

This gives the probability that a galaxy with magnitude Mr <

x at redshift z has a local dark matter density, Rδ . Each of thisrelation’s five free parameters, Θ(x,z) = {µc,σc,µ f ,σ f , p},are functions of galaxy absolute magnitude and redshift, andthe dependence of Θ(x,z) on these variables is modeled usinga Gaussian process as described in appendix B.

Figure 2 shows the distribution of p(Rδ ) for bins in galaxymagnitude and redshift (left) and galaxy magnitude and hosthalo mass (right). The full distribution p(Rδ |Mr < x,z) inbins of magnitude and redshift in the input SHAM model(points) is well reproduced by the ADDGALS model appliedto the T1 simulation (blue line). The reduced chi-squaredvalues for these fits can be large, O(10 − 100), but the me-dian absolute deviation is less than 2.5% for all redshift andmagnitude bins.

The orange and green lines in this figure show the samedistributions for central and satellite galaxies in the SHAM,to indicate which region of this distribution they populate. Atlow luminosity (bottom rows), these two populations are eas-ily separated by density. At higher luminosity (top row), thetwo populations cannot be distinguished by density; this mo-tivates separate modeling of bright central galaxies through〈Mr,cen〉(Mvir) so that central galaxies can be distinguishedfrom bright satellites in massive systems.

The right side of fig. 2 shows the distribution of p(Rδ )in bins of halo mass and galaxy magnitude at redshift z = 0,p(Rδ |Mvir,Mr < x,z = 0). Splitting the distribution in thisway gives intuition for how assigning galaxies by Rδ can ap-proximate assignment by halo mass, even in simulations thatare lower resolution than the relevant resolved halos. At highmass (right column), we see that this distribution is highlypeaked towards small Rδ(high densities), and satellites andcentrals are easily separated in Rδ space. The movement ofthis peak with mass is what enables assignment by Rδ to dis-tinguish between different halo masses. The smoothing scale


used here, (1.8×1013 h−1M� ), effectively distinguishes morebiased halos above the smoothing scale from lower mass ha-los below the smoothing scale (left most column), where halobias is relatively flat. For halos below the the mass smooth-ing scale, the distribution p(Rδ |Mvir,Mr < x,z) broadens, andthere is much more scatter in Mvir when assigning by Rδ . Dueto this broadening, the ADDGALS algorithm is susceptible toscattering galaxies between different halo masses at fixed Rδ .This can lead to Eddington-like biases in halo occupation dis-tributions in ADDGALS, where galaxies that should be placedin halos with masses less than the smoothing mass scatter upinto halos with masses approximately at the smoothing mass.However, in this regime, halo bias is relatively flat so thisscatter does not significantly impact the projected clusteringsignals in ADDGALS.

3.4. Algorithm Overview

The algorithm steps can be summarized as follows:

1. High-resolution modeling: apply a SHAM model tohigh-resolution N-body snapshots

2. High-resolution training:

(a) calibrate the luminosity–density–redshift rela-tion, p(Rδ|Mr,z) from the SHAM model

(b) calibrate the central luminosity–halo mass rela-tion, p(Mr,cen|Mvir) from the SHAM model

3. Populating lightcone simulations:

(a) populate central galaxies based on p(Mr,cen|Mvir)

(b) populate the rest of the galaxies based onp(Rδ|Mr,z) and the luminosity function

The final result is a synthetic galaxy catalog containing po-sitions, velocities, and single-band photometric information.Next we validate the steps above using observations from theSDSS.

4. VALIDATION OF THE LUMINOSITY–DENSITYASSIGNMENT

Here, we present a number of tests validating the abilityof ADDGALS to reproduce the properties of the T1 SHAMmodel in the lower-resolution L1 simulation. The tests inthis section compare an ADDGALS catalog run on the z = 0snapshot output of the L1 simulation and a SHAM catalogrun on the z = 0 snapshot output of the T1 simulation unlessotherwise noted.

The left side of fig. 3 compares the projected correlationfunction of the T1 ADDGALS, L1 ADDGALS, and L2 AD-DGALS catalogs with the T1 SHAM catalog and the SDSSmeasurements presented in Reddick et al. (2013). The func-tion wp(rp,π) is measured in the snapshots using the Landy–Szalay (Landy & Szalay 1993) estimator, i.e. wp(rp,π) =

(DD − 2DR + RR)/RR, using 13 logarithmically spaced binsin rp between 0.1h−1Mpc and 40h−1Mpc , subsequently in-tegrating these measurements along the line-of-sight out toπmax = 60h−1Mpc to obtain wp(rp). Ten times as many ran-dom points as galaxies are used to estimate DR and RR,where the randoms are distributed uniformly in each sub-volume. Errors are estimated via jackknife using 64 sub-volumes for each simulation.

We use the best-fit SHAM model from Lehmann et al.(2017), and as such the agreement between the SHAM cat-alog and SDSS is good, albeit with relatively large errors onthe SHAM measurements for the brighter samples. Detailsof how the SHAM catalog are constructed are described inappendix A. The ADDGALS catalog and the SHAM cata-log are consistent with each other at most scales and magni-tudes. Discrepancies between the ADDGALS catalogs and theSDSS data can be seen in the Mr < −22 and Mr < −19 mea-surements. Given the large errors on the SHAM catalog forthe Mr < −22 magnitude cut, it is unclear whether this dis-crepancy is due to a disagreement between ADDGALS andthe SHAM model, or whether ADDGALS and the SHAMmodel agree well, and the SHAM model disagrees with thedata. Lehmann et al. (2017) finds a marginal preferencefor lower scatter at brighter luminosities, which is consistentwith the latter of these two possibilities. We also make use ofa slightly different luminosity function than Lehmann et al.(2017), with the main difference coming at the brightest end,where our luminosity has a shallower slope. This may alsolead to a reduced clustering amplitude in the Mr ≤ −22 bin.

For Mr < −19, ADDGALS and SHAM catalogs are ingood statistical agreement for rp > 1h−1Mpc , but deviatesignificantly from SDSS at scales rp < 1h−1Mpc . TheSHAM model suffers from artificial subhalo disruption inthis regime, leading to lower small-scale clustering, andthe ADDGALS model has inherited this issue through thep(Rδ |Mr,z) distribution. Differences between the ADDGALS

catalogs due to differences in simulation resolution are dis-cussed in section 8.

The right hand side of fig. 3 compares the behavior ofgalaxy bias between the ADDGALS and SHAM catalogs.The bias measurements are made by taking the ratio of wp(rp)for galaxies and that measured on the matter distribution ineach respective simulation. Given the agreement between theprojected correlation functions of the SHAM model and AD-DGALS the agreement seen here is expected. A notable fea-ture in this figure is the scale at which the different samplesconform to a linear bias model, i.e. δg(r) = b1δm(r), on largescales. For the fainter Mr < −19 sample, galaxy bias becomeslinear in both catalogs for scales with rp > 4h−1Mpc . For thebrighter Mr < −21 sample, the SHAM catalog also appearsto behave linearly for rp > 4h−1Mpc . The ADDGALS mea-surements are fully consistent with the noisier SHAM mea-


Mr, z = [−21.8, 0.0] Mr, z = [−21.8, 0.3] Mr, z = [−21.8, 0.7]

Mr, z = [−20.5, 0.0]

10−1 100

Mr, z = [−19.3, 0.0]

10−1 100 10−1 100

Addgals

SHAM

Rδ [h−1 Mpc]

p(Rδ)

Mr, logMvir =[−21.8, 12.0] Mr, logMvir =[−21.8, 13.0] Mr, logMvir =[−21.8, 14.0]

Mr, logMvir =[−19.9, 12.0]

10−1 100

Mr, logMvir =[−18.0, 12.0]

10−1 100

SHAM Total

Addgals Total

10−1 100

SHAM Centrals

SHAM Satellites

Rδ [h−1 Mpc]

p(Rδ)

Figure 2. PDF of dark matter densities, characterized by Rδ , the radius enclosing a mass of 1.8× 1013h−1M� , for various galaxy populationsin the simulations. Both panels compare the ADDGALS distribution (blue lines) to the SHAM distribution (black points) in the T1 simulation.Dashed green and orange lines represent the Rδ distributions for satellite and central galaxies, respectively, in the SHAM catalog. Left panelcompares the Rδ distributions for galaxies, binned by absolute magnitude (rows),residing in all halo masses at different redshifts (columns).Fainter central galaxies tend to live in less massive halos, corresponding to large Rδ , while satellites are hosted by more massive halos corre-sponding to small Rδ , leading to the observed bi-modality in p(Rδ ). Right panel compares the Rδ distributions at z = 0 for galaxies as a functionof absolute magnitude (rows) and host halo mass (columns). The agreement shows that ADDGALS is able to successfully reproduce the densitydistribution of the SHAM model.

0

200

400

600

800

r pwp(rp)

Mr < −22100

150

200

Mr < −21

SHAM

Addgals T1

Addgals L1

Addgals L2

SDSS

10−1 100 101

rp[h−1 Mpc]

50

100

150

r pwp(rp)

Mr < −20

10−1 100 101

rp[h−1 Mpc]

50

75

100

125

Mr < −19

10−1 100 101

rp[h−1Mpc]

0.0

0.5

1.0

1.5

2.0

2.5

3.0

b(r)

2

SHAM

Addgals L1

Figure 3. Comparison of ADDGALS clustering and bias with that measured in the SHAM model it is tuned to. Left: Projected correlationfunctions for a SHAM model applied to the z = 0 snapshot of the T1 simulation, and ADDGALS models trained on that SHAM run on the z = 0snapshots of the T1, L1, and L2 simulations. Shaded regions and error bars are the 1σ errors, estimated via a jackknife procedure describedin the text. Each panel shows a different magnitude threshold. The corresponding SDSS measurements from Reddick et al. (2013) are shownfor comparison. Agreement between ADDGALS catalogs based on different resolution simulations and the SHAM model are generally good.Discrepancies between the small-scale simulation measurements and data are discussed in section 4. Right: Bias as a function of scale for theL1 ADDGALS catalogs and the SHAM model it is tuned to, indicating good agreement. Solid lines are for all galaxies with Mr < −21 anddashed lines are Mr < −19.


surements, but the smaller error bars in these measurementsshow hints of non-linear bias out to slightly larger scales.This result is expected for this more massive galaxy sample.

We compare the radial profiles of galaxies around host ha-los between the ADDGALS and SHAM catalogs in fig. 4.The measurements from ADDGALS run on both the T1 (lightblue) and L1 (dark blue) and L2 (orange) simulations are in-cluded. All curves are normalized so that the SHAM radialprofile equals one on the largest scale in the figure. We indi-cate where we expect resolution effects in the matter densityprofiles of the host halos by plotting curves below this scalewith a dashed line. This scale is approximated by five timesthe force softening scale used in each simulation.

We see that these radial profile measurements exhibit sig-nificant differences between ADDGALS and SHAM. Above∼ 200h−1kpc the two models agree well, as expected bythe agreement between projected correlation functions for thetwo models. Below this scale in both mass bins shown herethe SHAM catalog experiences a flattening, and becomesinconsistent with the expectation of profiles with Navarro-Frenk-White (NFW) functional forms shown by the blackline (Navarro et al. 1996). The NFW prediction shownhere assumes the mean host mass in the bin, the mass–concentration relation from Diemer & Joyce (2019), andis normalized so that it matches the SHAM curves at Rvir.The reason for this deviation from an NFW expectation forSHAM is likely artificial subhalo disruption for halos whichhave close pericentric passages (van den Bosch et al. 2018;van den Bosch & Ogiya 2018).

We can understand the behavior of the ADDGALS mea-surements in the following way. Under the assumptions ofthe ADDGALS algorithm, it is possible to write the radial pro-file of galaxies of absolute magnitude Mr in a halo of massMh as

ρ(r|Mr,Mh)∝∫

p(r|Rδ ,Mh)p(Rδ |Mr)dRδ (8)

=∫

p(r|Mh)p(Rδ |r,Mh)p(Rδ |Mh)

p(Rδ |Mr)dRδ . (9)

p(r|Rδ ,Mh) is the PDF of matter particle distances to centersof halos of mass Mh, given that the particle has a local densitymeasurement of Rδ . The first line above follows from the factthat ADDGALS assigns a galaxy with absolute magnitude Mr

to a random particle with density Rδ , where Rδ is a randomdraw from p(Rδ |Mr). The second line above is obtained fromthe first via application of Bayes theorem.

We see that we can express ADDGALS radial profiles interms of p(Rδ |Mr), the normalized radial profile of matter inhalos of mass Mh, p(r|Mh), the distribution of Rδ as a func-tion of halo-centric radius, r, p(Rδ |r,Mh), and p(Rδ |Mh).Note that in the limit where p(Rδ |r,Mh) is constant as a func-tion of r, then ρ(r|Mr,Mh) ∝ p(r|Mh). This is the case on

small scales (. 200h−1kpc ), where M∗, the mass scale usedto calculate Rδ , is significantly larger than the mass enclosedwithin radius r. This makes it clear that the flattening of theslope in the ADDGALS L1 and L2 catalogs on small scalesin the lower mass bin of fig. 4 is due to a flattening in the ac-tual matter profiles in halos at those scales, which is close tofive times the force softening radii used in these simulationswhere such a turn over is expected (DeRose et al. 2019b).When using a higher-resolution simulation like T1, this flat-tening is no longer seen.

For the lower mass bin, this implicit smoothing scale isapproximately the same size as Rvir, and so the ADDGALS

profiles approximate an NFW profile well for the entire one-halo term. For the higher mass bin, the smoothing scale issignificantly less than Rvir and so the one-halo term exhibitssignificantly more complicated behavior. Above the smooth-ing scale, the ADDGALS profiles track the SHAM catalogprofiles well. At scales less than Rvir but greater than thesmoothing scale, the SHAM catalog is significantly flattenedby artificial subhalo disruption, and so on these scales theADDGALS catalog inherits the same flattened profile. Onscales below the smoothing scale, the ADDGALS catalog re-verts to being proportional to the matter profile, leading tothe observed upturn relative to the SHAM catalog. This flat-tening of the ADDGALS profiles at high mass has significantimplications for optical galaxy cluster finding, as it leads toa deficit of galaxy number densities in massive clusters withrespect to that observed in data as discussed in section 7.2.Additional validation of the ADDGALS catalogs is providedin appendix D.

We note that a precise match between P(Rδ|Mr,z) in AD-DGALS and SHAM is extremely important in achieving thelevel of agreement seen in the above comparisons. Even rel-atively small changes in this distribution make these matchesappreciably worse. These comparisons demonstrate the realstrength of the ADDGALS algorithm. Here, we are able pro-duce simulated galaxy catalogs that are complete in absolutemagnitude and reproduce observed galaxy clustering prop-erties using an N-body simulation that has a particle num-ber density only a factor of ∼ 100 higher than the galaxydensity. Additionally, tests have shown that such an agree-ment is possible for simulations with significantly lower res-olution, where the number densities differ by only as muchas ∼ 20. The ability to create a realistic galaxy distributionon such modest resolution simulations allows the creation ofvery large volume synthetic catalogs, appropriate for mod-eling large photometric and spectroscopic surveys such asSDSS, DES, and LSST, without resorting to any sort of repli-cation techniques or highly expensive simulations, as wouldbe required for most other algorithms.

5. ASSIGNMENT OF GALAXY SEDS


2

4

6

r2ρg(r

)

Mr < −1913 ≤ log10Mvir < 14 Mr < −20 Mr < −21

102

r[h−1kpc]

0

1

2

3

r2ρg(r

)

14 ≤ log10Mvir < 15

102

r[h−1kpc]102

r[h−1kpc]

NFW

SHAM

Addgals T1

Addgals L1

Addgals L2

Figure 4. Radial profiles of galaxies in group- and cluster-sized halos for SHAM, ADDGALS T1, ADDGALS L1, and ADDGALS L2 catalogs.Columns show different absolute magnitude cuts given by the labels in each panel, and rows show different halo mass bins. Each panel isnormalized such that the SHAM curves pass through 1 on the largest scale plotted; uncertainties for each measurement are estimated usingjackknife. The lines transition from solid to dashed at five times the force softening length of each respective simulation in order to approximatethe scale where we expect resolution to affect the matter profiles of these halos. The black line indicates an NFW profile for the mean halomass in the bin, using the Diemer & Joyce (2019) mass–concentration model and normalized to match the SHAM profiles at Rvir. Agreementbetween the catalogs is generally good at scales larger than 200h−1kpc , which approximately corresponds to the scale imposed by the massused to calculate Rδ . This scale is relatively independent of halo mass, modulo changes in the mean halo concentration as a function of halomass. Below this scale, Rδ becomes approximately constant and the SHAM and ADDGALS profiles no longer track each other. At scalesless than this smoothing scale in the less massive bin the ADDGALS profiles approximate NFW profiles much more closely than the SHAMcatalog, which is affected by artificial subhalo disruption on these scales. On the smallest scales depicted here, the ADDGALS profiles beginto deviate near the resolution limits of each respective simulation, with the slope of the profiles turning over in a characteristic manner. In themore massive bin, none of the catalogs is well described by an NFW profile. At these masses, the ADDGALS smoothing scale can probe smallerfractions of the halo virial radius, and subhalo disruption effects that become important for satellite galaxies in the SHAM are inherited byp(Rδ |Mr), and thus the ADDGALS catalogs. This causes deviations from NFW profiles for all catalogs above the ADDGALS smoothing scale.For a more quantitative discussion, see section 4.

Once the galaxies have been populated with phase-spacepositions and r−band luminosities, we assign SEDs to eachgalaxy in the second part of the ADDGALS algorithm. Whilethe SED assignment algorithm was developed in conjunctionwith the galaxy assignment method discussed above, the al-gorithm is independent and able to operate on any galaxycatalog that already has absolute magnitudes defined in oneband. This part of the algorithm, which is referred to asADDSEDS (Adding Density-Determined SEDs) when used

on its own, has been used independently from the first stepof ADDGALS in previous works, based on earlier versions ofthe present algorithm (see, e.g., Gerke et al. 2013; Mao et al.2018).

ADDSEDS assumes that galaxy SEDs are set by both abso-lute magnitude and galaxy environment and uses a trainingset consisting of the SDSS DR7 VAGC (Blanton et al. 2005),whose SEDs are mapped onto the simulated galaxies. We cutthe training set to 0.005 < z < 0.2, since the bright, higher


redshift objects and very faint low redshift objects representa biased sample with respect to the rest of the populationof galaxies. The final training set consists of approximately600,000 spectroscopic galaxies from the SDSS main sample.

For each galaxy in the simulated galaxy catalog, the set ofSDSS galaxies in a bin of Mr around each simulated galaxyare identified and one SED is randomly chosen from this setand assigned to the simulated galaxy. The assumed bin widthdepends on Mr as ∆Mr = ∆Mr,0(22.5 + Mr), where ∆Mr,0 =0.1. If no galaxy in SDSS is found, the bin width is relaxed to∆Mr = ∆Mr,0(22.5+Mr)2; the latter criteria always enables amatch to be found. These bin widths have been tuned order tominimize discreteness effects in color space due to assigningthe same SDSS SED repeatedly to many simulated galaxies.

After this initial step, an environmental dependence of theSED assignment is imparted on the catalog. This is accom-plished by correlating rest frame g − r color with a local den-sity proxy that can be accurately measured in modest resolu-tion N-body simulations. The proxy that we have found towork well is the distance between the galaxy in question andthe nearest halo above a mass threshold, Mcut. We refer tothis proxy as Rh.

In detail, g− r colors are mapped onto the simulated galax-ies by enforcing the following ansatz:

P(< g − r|Mr) = P(< R̃h|Mr) (10)

where R̃h is a noisy version of Rh, and the Pearson correlationcoefficient between Rank(Rh) and Rank(R̃h) is set to rcorr. Indoing so, we allow for an imperfect correlation between Rh

and g − r, which is necessary to match observed clusteringstatistics as a function of g − r. In order to reduce discrete-ness effects, P(< g − r|Mr) and P(< R̃h|Mr) are computed insliding windows around each galaxy, such that the width ofthe window in Mr yields 100 galaxies with which to estimatethe above distributions. The values for Mh,cut and rcorr are freeparameters of the model, tuned to reproduce color dependentclustering in SDSS. This work uses the best fit values of theseparameters from DeRose et al. (2021a), where additional im-plementation details this model can be found.

This is very similar in spirit to Conditional AbundanceMatching (CAM) models that first assign magnitudes to ha-los via abundance matching, and then assign colors to galax-ies at fixed magnitude by making the ansatz that Rank(Xh)∼Rank(g−r), where Xh is usually taken to be a dark matter haloproperty such as formation time or accretion rate (Masakiet al. 2013; Hearin et al. 2014b; Watson et al. 2015).

Once a g − r value from SDSS is assigned to each galaxy,the SED associated with that g − r value is mapped onto thesimulated galaxy as well. The galaxy SEDs are representedas five SED template coefficients, αi, using the templates de-termined used in the KCORRECT algorithm (Blanton et al.2003a) as the basis. Since some tolerance in Mr is allowed

in the match between simulated and data galaxies, the SEDobtained from the data must be normalized such that it givesthe absolute magnitude originally assigned to the simulatedgalaxy. This is accomplished by re-normalizing the KCOR-RECT coefficients such that

−2.5log10α′iαi

= Mr,sim − Mr,train (11)

where α′i are the re-normalized coefficients, Mr,sim is the r-band absolute magnitude assigned to the simulated galaxyand Mr,train is the absolute magnitude of the matched trainingset galaxy. Using these coefficients, the SEDs can be inte-grated over filter bandpasses in order to produce observedgalaxy magnitudes.

Here we emphasize a few key points of the algorithm.First, we note that the training set SEDs are mapped to syn-thetic galaxies without using redshift information. Whencombined with the magnitude limit of our training set, thismeans that the number of galaxies in the training set variesstrongly as a function of Mr, with the highest density atMr ∼ −20. Additionally, no galaxy evolution models areapplied. In particular, the only two ways in which we ac-count for redshift evolution of colors are via evolution ofgalaxy magnitudes before the SED is selected (which gen-erally brightens galaxies with redshift), and via redshiftingof the SED to determine the observed galaxy magnitudes.We also apply a redshift- and luminosity-dependent evolu-tion of the red fraction of galaxies as described in AppendixE.2 of DeRose et al. (2019a), although this is not importantfor the SDSS comparisons presented here. We assume boththat the typical rest-frame colors of galaxies are unchangedand that the color–environment–luminosity relation remainsunchanged as a function of redshift. Both of these assump-tions are certainly incorrect in detail. Given these limitations,the photometry that is produced at high redshift should betreated with caution. Future development will address theseissues. Nonetheless, these corrections work extremely wellin reproducing galaxy properties over the redshift range ofSDSS. In a a companion paper extending this algorithm tohigher redshifts in the context of DES (DeRose et al. 2019a),we addressed one aspect of galaxy evolution by adjusting therelative fraction of red and blue galaxies (note that this wasapplied to a slightly older version of ADDSEDS).

5.1. Summary of SED Assignment Algorithm

We briefly summarize the most important steps in our SEDassignment algorithm:

1. Compile a training set of spectroscopic galaxies.

2. Calculate the distance to the nearest massive halo, Rh,for each simulated galaxy.


3. Conditional abundance match g − r colors in the train-ing set to R̃h, a noisy version of Rh.

4. Use the CAM relationship to map SEDs from the train-ing set galaxy to the synthetic galaxy.

5. Redshift the SED and convolve with filter pass bandsto determine observed magnitudes.

Applying this algorithm results in a synthetic photomet-ric catalog down to some limiting absolute magnitude. Notethat when attempting to model a magnitude-limited survey,such as SDSS, it is necessary to generate synthetic galaxiesto some limiting absolute magnitude, Mr,lim(z), as a functionof redshift. Because of the significant scatter in the Mr −mr(z)relation due to variation in galaxy SEDs, it is generally neces-sary to create significantly more galaxies than necessary, cut-ting the catalog to the appropriate apparent magnitude limitin a post-processing step.

6. OBSERVED MAGNITUDES AND PHOTOMETRICERRORS

One of the main strengths of the ADDGALS algorithm isits ability to produce photometry in a number of differentbands using empirically determined SEDs. In order for thesemagnitudes to be useful, it is often necessary to include pho-tometric error estimates. These errors cause objects abovethe detection threshold to scatter out of our detection lim-its as well as causing many more faint objects to scatter in.Modeling these errors appropriately can thus be important foranalyses that use galaxies close to the detection limits of theirrespective surveys, which is the case for most weak lensinganalyses.

A significant challenge is the construction an appropriateerror model that is consistent between surveys. Existing sur-veys report limiting magnitudes in several inconsistent ways.For example, some surveys report 5σ galaxy magnitudes,some report 10σ point source magnitudes, and still othersreport limiting magnitudes by measuring the 80% complete-ness limit.

To ensure that errors are consistently defined, we have in-stead taken a pragmatic approach to calculate galaxy limitingmagnitudes. For all existing surveys, we remeasure the 10σlimiting magnitude (mlim) given the reported galaxy photo-metric errors using the algorithm described in Rykoff et al.(2015). To match the full magnitude/error distribution, wealso measure an effective exposure time (teff) and an addi-tional parameter (Σint, described below) to encompass varia-tions in survey depth, seeing, and galaxy size.

Synthetic photometric errors are calculated using a rel-atively straightforward method of calculating the Poissonnoise for the flux of a simulated galaxy plus the sky noise

in a particular band. Here, the total signal from these twosources (galaxy and sky) are given by the relation:

Sgal = 10−0.4(mgal−ZP)× teff

Ssky = fsky× teff,(12)

where mgal is the magnitude of a galaxy and fsky is the skynoise (in a particular band), and teff is the effective exposuretime. In all cases the zero-point is set to ZP = 22.5, and allfluxes in the data tables are converted to nanomaggies suchthat:

m = 22.5 − 2.5log10 fnmgy. (13)

Finally, we note that the sky noise parameter, fsky, can beestimated from the 10σ limiting magnitude mlim and the as-sociated flim :

fsky =f 2lim,1× teff

100− flim,1, (14)

where flim,1 is the 1-second flux at the limiting magnitudegiven by eq. (12).

Given the galaxy flux and sky flux, in the simplest form thetypical noise associated with each galaxy will be given by arandom draw from a distribution of width σflux =

√Sgal + Ssky.

However, in a simple model to account for variations ingalaxy size, survey depth, and seeing, we add in an addi-tional log-normal scatter parameter Σint. After taking σint asa random draw from a distribution of width Σint, we arrive at:

σflux,tot = exp(lnσflux +σint). (15)

For most surveys, the typical value for Σint is ∼ 0.2 − 0.3,equivalent to a 20–30% scatter in effective depth. For partic-ular survey applications such as in DeRose et al. (2019a), weuse maps of the effective depth variation as a function of skyposition in order to more realistically model these variations.

After taking a random draw from a distribution of widthσflux,tot for each galaxy, the total observed flux and error areconverted to nanomaggies, such that fnmgy = Sgal,obs/teff. Fi-nally, magnitudes and magnitude errors are calculated as:

mobs = 22.5 − 2.5log10( fnmgy)

merr,obs =2.5

ln(10)ferr,nmgy

fnmgy

(16)

We provide galaxy magnitude limits for a number of ex-isting and planned surveys. Table 2 lists all the survey mag-nitudes included in our simulations, including the filters andthe limiting magnitudes for each filter. The existing surveysincluded are SDSS DR8 (Aihara et al. 2011), SDSS Stripe82 coadds (Annis et al. 2014); WISE (Jarrett et al. 2011);VHS (McMahon et al. 2013), VIDEO (Jarvis et al. 2013),


Table 2. Survey limiting magnitudes.

Survey Limits

DECam u g r i z Y

DES DR2 10-σ 24.07 23.82 23.11 22.28 20.79

SDSS u g r i zDR8a 20.4 21.7 21.2 20.8 19.3

Stripe82a 22.1 23.4 23.1 22.6 21.2

VISTA z Y J H Ks

VHSb 20.1 19.7 19.5VIKINGb 21.6 20.9 20.8 20.2 20.2VIDEO 25.7 24.6 24.5 24.0 23.5

WISE 3.4µ 4.6 µ

WISEc 17.1 15.7

LSST u g r i z YLSST-1 year e 24.2 25.8 25.9 25.2 24.0 23.15

NOTE—All limiting magnitudes are 10σ AB magnitudes for galaxiesunless photometric errors are not provided.

a Limits appropriate for SDSS model magnitudes used for color mea-surements.

b Limits for 2′′ aperture-corrected magnitudes. Magnitudes have beenconverted from Vega to AB such that zAB = zVega + 0.52; YAB = YVega +

0.62; JAB = JVega + 0.94; HAB = HVega + 1.38; Ks,AB = Ks,Vega + 1.8.

c Limits for MAG_AUTO. Magnitudes have been converted from Vega toAB such that JAB = JVega + 0.91 ; Ks,AB = Ks,Vega + 1.85 (Blanton et al.2005).

e Rescaled from proposed 10-year depth for 5σ point source detections.

and VIKING (Sutherland 2012); and DES DR2 (DES Col-laboration et al. 2021). We also produce magnitudes for Ru-bin Observatory’s Legacy Survey of Space and Time, whereone-year limiting magnitudes are obtained by re-scaling theprojected 10-year depth for 5σ point source detections4.

7. VALIDATION AGAINST SDSS DATA

We now present tests of our magnitude and SED assign-ment algorithms. Here we focus primarily on the global dis-tribution of galaxy colors, clustering as a function of color,and radial profiles of galaxies around clusters in the lo-cal Universe, compared to data from SDSS. DeRose et al.(2019a) presents additional tests of the model, including ad-ditional comparisons between our synthetic catalogs and ob-servations of galaxy clusters, as well as comparisons of lumi-nous red galaxy populations, high-redshift colors, and photo-metric redshifts in the DES. All of the comparisons presentedin this section use ADDGALS run on the L1 lightcone, pro-

4 https://docushare.lsstcorp.org/docushare/dsweb/Get/LPM-17

ducing a quarter sky (10,313.25 square degree) footprint outto z = 0.32.

7.1. One-Point Statistics

We first investigate the ability of ADDGALS to reproducecolor, magnitude, and redshift distributions by comparingwith the SDSS main galaxy sample. The left panels of fig. 5show histograms of observed magnitude counts in griz bands.The ADDGALS catalog is compared to the magnitude-limitedSDSS main sample with mr < 17.77, where the error barsshown are computed with jackknife using regions of approx-imately 200 sq. degrees. The agreement is good to better than10% to r ∼ 13, with similar performance at the same num-ber density in the other bands. The discrepancies seen onthe bright end are a result of the redshift evolution imposedin our input rest-frame luminosity function in order to matchDES galaxy number densities as described in DeRose et al.(2019a). The upturns at the faint end in griz where SDSSis very incomplete are sensitive to the assumed photometricerror model. The u-band (not plotted) performs significantlyworse as a result of the discrepancies seen in fig. 6; which we


believe are largely due to the fact that the SED templates arenot fully tuned in to u-band data.

The right panel of fig. 5 shows the redshift distribution forgalaxies in the simulated catalogs compared with those inthe SDSS DR7 main sample, using the same magnitude andredshift cuts as the previous comparison. Error bars are com-puted using the same jackknife procedure. Again we findgood agreement.

The top section of fig. 6 presents the distributions of ob-served u−g,g−r,r − i, and i−z as a function of Mr in our sim-ulations (blue), compared with those from our training set(black). Also displayed in red are the distributions that areobtained when reconstructing each training sample galaxy’smagnitudes using only their KCORRECT coefficients. Thesame magnitude and redshift cuts applied to the training setare also applied to the ADDGALS catalog. Different rows inthis figure show bins of absolute magnitude as indicated bythe labels.

Although the SEDs from ADDGALS are selected from atraining set of these SDSS galaxies, matches in the globalcolor distribution are not guaranteed. The reason for this istwofold. First, if the joint distribution of Mr and z foundin SDSS is not reproduced in our simulations, then evenif p(SED|Mr) is perfect, the observed colors will not bematched by the simulation. Secondly, KCORRECT coeffi-cients are a lossy compression of galaxy SEDs, and so whenthese SED representations are integrated over bandpasses,they are not guaranteed to exactly reproduce observed mag-nitudes and colors.

For g − r, r − i and i − z we find very good agreement be-tween SDSS and ADDGALS for all magnitude bins. For u − gthe agreement between ADDGALS and SDSS is significantlyworse for all magnitudes, with ADDGALS showing a nar-rower red-sequence that is slightly shifted to low u − g rel-ative to the data. The reason for this discrepancy is that theKCORRECT coefficients that are used to represent the train-ing set SEDs are not fit to the u-band in SDSS. This meansthat these SED fits do a worse job at reproducing u−g colors,even when comparing the colors predicted from the templatefits to the observed colors on a galaxy-by-galaxy basis in ourtraining set (Blanton et al. 2003a). This is evidenced by thefact that the red lines also show the same disagreement withblack. The bottom section of fig. 6 shows joint distributionsof u − g and g − r, g − r and r − i, and r − i and i − z colors.Again we see worse performance in u − g due to the afore-mentioned issues with KCORRECT model fits, but otherwisethe agreement is very good.

7.2. Color-Dependent Clustering

We have shown previously in section 4 that the correlationfunction of galaxies at a given absolute magnitude can bewell matched using ADDGALS. Here we test whether the

SED assignment algorithm is able to reproduce clustering asa function of magnitude and color by splitting our simulatedsamples into red and and blue sub-samples and comparingagain to the SDSS DR7 main galaxy sample.

In fig. 7 we compare our simulations to projected clus-tering measurements in magnitude bins from Zehavi et al.(2011b), adhering to their definition of red galaxies: g − r >0.21 − 0.03Mr. We employ the same wp(rp) estimator pro-cedure as outlined in section 4, but now distributing the ran-doms uniformly over the unmasked 10,313 square degreescovered by our lightcone simulations, drawing redshifts forour random points to match the distribution of redshifts fol-lowed by each galaxy sample separately. The same redshiftbinning as employed by the SDSS measurements is used foreach magnitude bin. Errors on our simulations are estimatedvia jackknife using ≈ 200 square degree regions.

The trends in the SDSS data are reproduced by our sim-ulations, with red galaxies significantly more clustered thantheir blue counterparts at fixed Mr. The discrepancies be-tween the red and blue galaxy clustering measurements in oursimulations and those in SDSS are largely a consequence ofissues with the clustering as a function of Mr, especially forthe faintest sample shown. In the bottom panel of this figure,the measurements in the top panel are divided by the wp(rp)measurements for samples with the same absolute magnitudeselection, but without color selection in order to remove dis-crepancies caused by issues in the model as a function of Mr

alone. Although the fit is not good in a chi-squared sense,we see that most of the discrepancy in the top panel is dueto imperfect modeling of clustering as a function of Mr, notour SED assignment algorithm, although red galaxies are stillslightly under-clustered with respect to the SDSS measure-ments.

Despite the reasonable performance exhibited in figure 7,ADDGALS is not able to reproduce the abundance of galaxyclusters as a function of richness, λ, a common mass proxyused in analyses of REDMAPPER clusters (Rozo et al. 2011;Rykoff et al. 2014). To see why this is, fig. 8 compares pro-jected galaxy profiles around REDMAPPER galaxy clustersbetween ADDGALS and SDSS. The SDSS measurements aretaken from Baxter et al. (2017), and our measurements usethe same procedure as detailed there. The only differencebetween our measurements and the measurements in Baxteret al. (2017) is that the richness cut made on the cluster cata-log has been adjusted to λ > 9.3 rather than λ > 20 in orderto match the abundance of clusters found in SDSS. In doingso, this figure examines galaxy profiles around halos of simi-lar masses in the simulations and SDSS data. Profiles aroundclusters of the same richness show much better agreement,mostly due to the constraint that equal richness imposes onthe projected galaxy number densities at the cluster bound-ary.


10−1

100

101

n(m

)[d

eg2]

g r i z

1213141516171819

m [mag]

−0.1

0.0

0.1

∆n

(m)/n

(m) s

dss

0.000 0.025 0.050 0.075 0.100 0.125 0.150 0.175 0.200

z

0

1

2

3

4

5

6

7

n(z

)[deg−

2]

Addgals

SDSS

Figure 5. Left: Galaxy counts in the SDSS griz bands, for all galaxies brighter than mr < 17.7 and at redshifts z < 0.2. Black points indicatethe SDSS DR7 VAGC sample and lines are the L1 ADDGALS simulation. Error bars are calculated via jackknife on ∼ 200 sq. degree regions.Right: Redshift distribution for simulated galaxies selected to match the SDSS spectroscopic sample, compared to the redshift distribution forgalaxies with measured redshifts in the main sample of SDSS DR7. In each case, the galaxies are limited by 14.0 < r < 17.7 and 0 < z < 0.2,to match the SDSS training sample used. Error bars are calculated via jackknife on ∼ 200 sq. degree regions.

The top panel shows galaxy profiles for three differentsamples, all galaxies with Mr < −19.43 in black, and galaxiesin the top and bottom quartiles of rest frame g − r in red andblue respectively. At large scales all profiles agree quite wellwith the measurements in SDSS, evidencing the fact that thenumber densities and biases of these samples in SDSS andADDGALS are very similar. On small scales, all three sam-ples in our simulations exhibit much shallower profiles thanseen in the data. The deficits in the red and blue samplesare driven entirely by the lack of galaxies in general on thesescales, and not an issue with the quenched fraction of galax-ies as a function of radius fq(r). This can be seen more ex-plicitly in the bottom panel, where we divide the red and bluegalaxy profiles by the total profiles for the simulations anddata respectively. Here we see that the fq(r) is actually over-predicted in our simulations for the halo mass range probedby these measurements.

The reason for the deficit in the total galaxy profile is likelyartificial subhalo disruption in the T1 simulation, which isthen inherited by the ADDGALS model via our training pro-cess. Higher-resolution simulations, or an orphan model inthe T1 simulation may help to remedy these issues. Indeed,DeRose et al. (2021a) shows that the inclusion of a modelfor orphan galaxies can significantly improve the ability ofSHAM to fit Mr < −19 clustering measurements. This im-provement is facilitated by a large increase in galaxy occupa-tion for Mvir > 1013 h−1M� , and as such would also remedythe galaxy number density issues in ADDGALS at the clus-ter mass scale. This increase in satellite fraction boosts largescale bias, but this is compensated for by a decrease in as-sembly bias which has the competing effect of decreasinglarge-scale bias, while largely maintaining the one-halo clus-

tering signal (see, e.g. the top left panel of figure 2 in DeRoseet al. 2021a). Work is ongoing to incorporate orphan galaxiesinto the SHAM models that ADDGALS is trained on, but untilthese improvements are fully implemented the cluster prop-erties in ADDGALS catalogs must be treated with caution.

8. RESOLUTION REQUIREMENTS OF ADDGALS

Now that we have presented and validated the ADDGALS

algorithm, it is important to understand its resolution require-ments, as the relatively modest resolution requirements of theADDGALS algorithm are one of its major strengths. The leftside of fig. 3 compares the projected clustering of ADDGALS

run on the z = 0 snapshots of three simulations with progres-sively lower mass resolution, T1, L1 and L2, where the mea-surements and errors are computed as described in section 4.In all cases, simulations are converged with respect to the er-rors on the measurements in the T1 simulation, which has asimilar volume to the SDSS main galaxy sample. The L1 andL2 models are also in relatively good agreement, althoughthey show discrepancies at the 10% level on small scales forthe Mr < −22 sample and 5% for Mr < −21 and Mr < −20 onscales rp < 1h−1Mpc .

The discrepancies between ADDGALS and the SDSS datafor the Mr < −19 sample are not due to an insufficiency ofthe ADDGALS algorithm, but rather resolution effects in theSHAM catalog that ADDGALS is trained on. This is demon-strated in the right side of fig. 9, where ADDGALS modelstrained on two different SHAM models are compared. Oneis our fiducial SHAM run on the T1 simulation. The otheris a SHAM run on the higher resolution C250 simulation,which was run with the same settings as the T1 simulation,but with a simulation volume of (250h−1Mpc )3, 25603 par-


SDSS

kcorrect

Addgals

−22 < Mr < −21

p(c)

−21 < Mr < −20

1 2

u− g0.0 0.5 1.0

g − r0.2 0.4 0.6

r − i

−20 < Mr < −19

0.0 0.2 0.4 0.6

i− z

0.5 1.0g − r

1.0

1.5

2.0

u−g

0.2 0.4r − i

0.4

0.6

0.8

1.0

1.2

g−r

0.2 0.3i− z

0.35

0.40

0.45

r−i

SDSS

Addgals

Figure 6. Top: Distributions of u − g, g − r, r − i, and i − z colors (columns) in bins of absolute magnitude (rows). In all panels, the black linerepresents the distribution in SDSS DR7, while the blue lines show the distributions for ADDGALS L1 catalog, and the red dashed lines show thecolors predicted from the SDSS DR7 training set if reconstructed from their KCORRECT fits. Nearly all discrepancies between ADDGALS andSDSS are due to inaccuracies in KCORRECT, not our method for assigning SEDs to our simulations, as can be seen by the good match betweenthe blue and red lines. Bottom: Color–color distribution for galaxies with mr < 17.77. In both panels, grey contours show measurements fromthe SDSS DR7 main galaxy sample; blue contours show the ADDGALS catalog. Contours include 39% and 84% of the galaxies.


100

200

r pwp(rp)

−20 < Mr < −19 −21 < Mr < −20 −22 < Mr < −21

10−1 100 101

1

2

3

wp(rp)/wp(rp)tot

10−1 100 101

rp [h−1Mpc]10−1 100 101

Addgals

SDSS

Figure 7. Top: Projected galaxy correlation function in magnitude-selected samples for ADDGALS applied to the L1 simulation (lines) com-pared to the measurements from SDSS (Zehavi et al. 2011b; points). Correlation functions binned by Mr only are shown in black; red andblue galaxies are shown in red and blue. Bottom: The red and blue clustering measurements as shown in the top panel divided by the samemeasurements without color selection for the ADDGALS L1 simulations and the data.


10−1

100

101

Σg(r

)[h

2Mpc−

2]

Addgals

SDSS

100 101

r [h−1Mpc]

0.0

0.2

0.4

Σg(r

)/ Σ

(r)tot

Figure 8. Top: Projected galaxy profiles around REDMAPPER

clusters for red, blue, and all galaxies with Mr < 19.43. Profilesmeasured in L1 ADDGALS are compared to measurements fromBaxter et al. (2017). The deficit in clustering that is seen for allsamples is likely due to artificial subhalo disruption in the SHAMmodel that the ADDGALS model is trained on. This effect is impor-tant at larger scales in this measurement than in fig. 3, because itincludes higher mass halos. Bottom: Projected galaxy profiles splitby color, normalized by the profile for all galaxies. The trends incolor are well captured by ADDGALS although the quenched frac-tion fq(r) is slightly over-predicted at small scales.

ticles and a force softening of ε = 0.8h−1kpc . The clusteringof the SHAM C250 model is increased on small scales dueto reduced subhalo disruption in the C250 simulation rela-tive to T1. We don’t compare to the SDSS data here becausethe C250 simulation is too small to use the same line-of-sight projection length of πmax = 60 for wp(rp) as used in thedata. Instead we use πmax = 20 for this comparison. Nonethe-less it is apparent that the SHAM C250 model would agreewith the SDSS measurements on small scales. The AD-DGALS model trained on the SHAM C250 also inherits thisincreased clustering on small scales. This suggests that with asufficiently high resolution training simulation, or an orphanmodel that traces substructure effectively until it is physi-cally disrupted, ADDGALS could reproduce the small-scaleclustering of a Mr < −19 sample using a simulation with theresolution of the L1 or even L2 simulation.

In practice, the more important parameter governing theconvergence of the ADDGALS method for projected two-point functions is the minimum mass to which central galax-ies are populated, Mmin. This can can be seen in on the leftside of fig. 9. All catalogs included in this figure are runon the T1 simulation, varying Mmin between 6× 1012 and5× 1013h−1M� as indicated in the legend of the figure but

keeping all other parameters fixed. In most cases the shiftsare smaller than the errors on the measurements, but it isclear that this parameter is a very important free parameterof the ADDGALS algorithm, especially for brighter samples.The dependence on this parameter hints at a breakdown inthe bright galaxy regime of our assumption that matchingp(Rδ|Mr,z) of a SHAM is sufficient to reproduce wp(rp) inthat SHAM, since even for a high-resolution simulation suchas T1, significant discrepancies in clustering are producedwhen too many bright galaxies are populated using this re-lation, whereas faint galaxies are much less affected. Thisbreakdown is sourced by the fact that p(Rδ|Mvir,Mr,central)does not evolve significantly for Mvir > 1013 h−1M� for brightgalaxies. As such, if we do not place centrals in halos byMvir > 1013 h−1M� , then using p(Rδ|Mr,z) induces signifi-cant scatter in Mvir, placing galaxies that should have beencentrals of lower mass halos as satellites in high mass halos,thus significantly boosting the clustering signals for brightgalaxies as seen in the left hand side of fig. 9. The exact halomass that we must populate central galaxies down to verylikely depends on the mass used to compute Rδ , but we havenot explored this dependence in detail.

Finally, we expect ADDGALS to fail for simulations wherethe mass scale used to define Rδ is too coarsely resolved bythe particle resolution. In this regime, the density estimatesproduced by Rδ will not consistently measure similar envi-ronments in the training volumes versus the volumes used forthe synthetic catalogs. As simulations of this low resolutionare increasingly irrelevant due to advances in computationalpower, we have not explored this effect in detail.

9. CONCLUSIONS

We present the ADDGALS (Adding Density-DeterminedGalaxies to Lightcone Simulations) algorithm, designed toproduce realistic simulated galaxy populations with only amodest computational cost. To achieve this goal, we em-ploy a combination of empirical models of galaxy–halo con-nection in high-resolution simulations with a custom, phys-ically motivated machine learning model that is trained toplace galaxies into lower resolution volumes. This combina-tion of techniques, which explicitly incorporates key statisti-cal information from the data (e.g., the luminosity functionand the distribution of colors/SED types), lends a baselinelevel of realism to the output catalog. In this work, we showthat we are able to match several characteristics of the in-put training catalog, including matching the clustering prop-erties of the input empirical model as a function of r-bandabsolute magnitude and redshift. We also demonstrate thatwe can produce realistic color distributions and can repro-duce the most significant trends in clustering as a function ofcolor. Several further comparisons are presented in DeRoseet al. (2019a), which additionally describes associated weak


0

500

1000

1500

2000r pwp(rp)

Mr < −22

Mmin = 6× 1012

Mmin = 1× 1013

Mmin = 5× 1013

100

150

200

250

Mr < −21

10−1 100 101

rp[h−1 Mpc]

50

100

150

r pwp(rp)

Mr < −20

10−1 100 101

rp[h−1 Mpc]

50

75

100

125

Mr < −19

10−1 100 101

rp[h−1 Mpc]

102

4× 101

6× 101

r pwp(rp)

SHAM C250

SHAM T1

Addgals L1-C400

Addgals L1-C250

Figure 9. Impact of resolution on galaxy clustering measurements in ADDGALS. Left: Projected correlation functions for absolute-magnitude-limited samples of Mr < −22,−21,−20,−19 for ADDGALS models, varying the minimum mass halo mass, Mmin, used to populate centralgalaxies. This is the parameter that our models are most sensitive to, and largely drives the resolution requirements for ADDGALS. Right:Projected correlation functions for Mr < −19 samples, varying the resolution of the input simulation used for the SHAM model. The defaultT1 simulation (black) is compared to a higher resolution C250 simulation (red); the latter model has stronger clustering below ∼ 1h−1Mpcdue to reduced artificial subhalo disruption. The ADDGALS model trained on this simulation, ADDGALS L1-C250 (yellow) also inherits thisincreased clustering compared to the default ADDGALS L1-C400 model (blue).

lensing catalogs, additional redshift evolution, and tests pho-tometric redshift and cluster finding methodology in higher-redshift synthetic surveys.

The modest simulation requirements of this method haveenabled us to produce volumes of synthetic sky surveys thatwould be significantly more computationally expensive withother methods in active use. This includes, for example, theability to produce the large and deep sky areas appropriatefor modeling modern photometric surveys like DES, LSST,Euclid, and the Roman Space Telescope surveys. The cata-logs created with the ADDGALS method have been used fora wide variety of applications including tests of photomet-ric redshift, clustering, weak lensing, cross-correlation, andcluster finding methodology. In companion papers, the mod-est computational cost has allowed us to produce a signifi-cant number of such catalogs, e.g. tens of full area and depthrealizations, which have be used to statistically test the per-formance of precision cosmological probes in the DES (Mac-Crann et al. 2018; DeRose et al. 2019a, 2021a).

One of the distinguishing features of ADDGALS’ machine-learning model is that it uses a hand-crafted parameteriza-tion as opposed to a more generic functional form (e.g., atree ensemble, neural network, etc.). This parameterization isconstructed specifically to fit the simulation data, especiallynear the tails of the distribution where some extrapolation isneeded. Looking ahead, generalizations of the ADDGALS al-

gorithm that employ more traditional machine-learning mod-els may be able to achieve better performance, but it is likelyspecial attention will need to be paid to how these modelsextrapolate beyond the training data. This lesson is likelyquite general for any machine learning model of this type,given that high-resolution simulations typically sample lessvolume of the universe than lower-resolution ones.

This method is not without limitations. In particular, ithas difficulty precisely reproducing the small-scale clusteringmeasured from SDSS of the faintest galaxy samples consid-ered in this paper. This issue is likely inherited from artificialsubhalo disruption in the simulation that ADDGALS is trainedon. This deficit in clustering leads a low normalization of theHOD, and a deficit of galaxies at the cores of cluster mass ha-los with respect to observations (DeRose et al. 2019a). Therealistic properties of galaxy cluster populations do enableus to run modern cluster finders on the simulated data, butthe lack of galaxies in the central regions of massive groupsand clusters leads to an offset in the mass–richness relationthat can hinder some use cases related to important clusterselection systematics.

Additionally, our SED assignment algorithm requires arepresentative sample of observed galaxies from which todraw SEDs in order to accurately reproduce galaxy colors. Ifthere are SEDs that appear at high redshift, or fainter absolutemagnitudes, that are not present in our low redshift SDSS


training set, then the assumptions used by ADDGALS to pop-ulate SEDs will be broken. We showed here that in the localUniverse, these assumptions can also be broken outside thewavelength range where the SED templates are well tuned,and we urge some caution for this reason in using bands out-side of rest-frame g through Y bands. For a discussion of theextent to which these assumptions are broken in DES, seeDeRose et al. (2019a). We expect that significant progresscan be made in these areas especially using data from larger,deeper spectroscopic surveys to train the model.

A final significant issue with this methodology comparedto more accurate methods based on fully resolved halo sub-structures and their histories (including hydrodynamicalmodels, semi-analytic models, or empirical models basedon high-resolution merger trees) is that it may lack impor-tant correlations that are expected in such models. Thesecould include, for example, the correlated properties of bothcentral and satellite galaxies with each other and with thelarger-scale environment.

Given the size of ongoing and upcoming surveys, and theirdemands for accurate reproduction of galaxy magnitudes,colors, and spatial clustering, it is likely that techniqueswhich combine empirical methods with machine learningmethods in order to reduce computational cost will remaina necessary tool for precision cosmology for the foreseeablefuture. They will also provide a very useful complement tohigher-fidelity simulations that can be produced over smallervolumes. It is thus worth considering how to mitigate someof the limitations discussed above, and active work in eachof these areas is ongoing.

ADDGALS data, including a one-quarter sky simulation outto z = 2.35 to a depth of r = 27, with magnitudes appropriatefor modeling several surveys including SDSS, DES, VISTA,WISE, and LSST, will be available upon publication at http://www.slac.stanford.edu/~risa/addgals.

RHW thanks her many collaborators for near infinite pa-tience on the completion of this paper, which was begun inanother era. We thank Rachel Reddick, Alex Ji, and our col-laborators on the maxBCG team and in the DES collabora-tion, especially Chihway Chang, Carlos Cunha, Joerg Diet-rich, Sarah Hansen, Brandon Erickson, Daniel Gruen, Ben-jamin Koester, Niall MacCrann, Chris Miller, Eduardo Rozo,Erin Sheldon, Tim McKay, and Molly Swanson, for signifi-cant useful feedback on several earlier versions of these cat-alogs. We thank Andreas Berlind, Derek Bingham, JoannaDunkley, Andrew Hearin, Andrey Kravtsov, Yao-Yuan Mao,Hiranya Peiris, Eduardo Rozo, Frank van den Bosch, andMartin White for useful discussions about methodology andstatistical inferrence during early development. This workreceived support from the U.S. Department of Energy undercontract number DE-AC02-76SF00515 at SLAC National

Accelerator Laboratory, and a Terman Fellowship at StanfordUniversity. JD is supported by the Chamberlain Fellowshipat Lawrence Berkeley National Laboratory. Argonne Na-tional Laboratory’s work was supported by the U.S. Depart-ment of Energy, Office of Science, Office of Nuclear Physics,under contract DE-AC02-06CH11357.

This research used resources of the National Energy Re-search Scientific Computing Center (NERSC), a U.S. De-partment of Energy Office of Science User Facility locatedat Lawrence Berkeley National Laboratory, operated underContract No. DE-AC02-05CH11231. Some of the comput-ing for this project was performed on the Sherlock cluster,and on computing resources at SLAC National AcceleratorLaboratory. We would like to thank Stanford University andthe Stanford Research Computing Center for providing com-putational resources and support that contributed to these re-search results. We are grateful to Stuart Marshall and the restof the SLAC computing team for extensive support of thiswork.

This study made use of the SDSS DR7 Archive (as wellas earlier versions while the model was in development),for which funding has been provided by the Alfred P. SloanFoundation, the Participating Institutions, the National Aero-nautics and Space Administration, the National ScienceFoundation, the U.S. Department of Energy, the JapaneseMonbukagakusho, and the Max Planck Society. The SDSSWeb site is http://www.sdss.org/. The SDSS is managedby the Astrophysical Research Consortium (ARC) for theParticipating Institutions: the University of Chicago, Fer-milab, the Institute for Advanced Study, the Japan Partic-ipation Group, the Johns Hopkins University, Los AlamosNational Laboratory, the Max-Planck-Institute for Astron-omy (MPIA), the Max-Planck-Institute for Astrophysics(MPA), New Mexico State University, University of Pitts-burgh, Princeton University, the United States Naval Ob-servatory, and the University of Washington. The authorsacknowledge the support and stimulating environments ofthe Aspen Center for Physics and the Kavli Institute forTheoretical Physics (under NSF Grant No. PHY99-07949)where some of this work was performed.

A. CONSTRUCTING THE TRAINING GALAXYCATALOG WITH SUBHALO ABUNDANCE

MATCHING

In this work, we use subhalo abundance matching (SHAM,e.g. Conroy et al. 2006b; Behroozi et al. 2010; Wetzel &White 2010; Reddick et al. 2013; Lehmann et al. 2017) toconstruct the training data for the ADDGALS model. Specif-ically, we employ the model described in Lehmann et al.(2017), placing galaxies into resolved halos and subhalos bymatching the number density of galaxies as a function of ab-solute magnitude with that of the dark matter halos or sub-

http://www.slac.stanford.edu/~risa/addgals

http://www.slac.stanford.edu/~risa/addgals


halos as a function of vα = vvir

(vmaxvvir

)α. The quantities vmax

and vvir are evaluated in this equation at the time when thehalo is accreted onto a larger halo. We take α = 0.684 andσ(Mr|vα) = 0.425. The parameter α can be thought of asdetermining the concentration dependence of a halo’s rankordering, with larger α giving higher concentration halos ahigher rank at fixed mass. The choice to evaluate the veloc-ities used to calculate vα at the epoch when the halo is ac-creted onto a larger halo is based on the idea that a galaxy’sstellar mass should be much less susceptible to stripping thanthe outer regions of its dark matter halo (Conroy et al. 2006b;Reddick et al. 2013).

SHAM models generally require a single observational in-put, the redshift-dependent galaxy luminosity function (LF)in a given band. We find that a pure Schechter function isinsufficient to model galaxy luminosities for our purposes.At bright luminosities, there are significantly more galax-ies than a pure exponential model would predict (see, i.e.,Blanton et al. 2003b; Bernardi et al. 2013). In particular,the steep bright-end slope of a Schechter function results in avery flat mass–luminosity relation for brightest-cluster galax-ies (BCGs) when using abundance matching, a relation thatis inconsistent with observations (e.g., Hansen et al. 2009b;Kravtsov et al. 2018; To et al. 2020). Using a luminosityfunction that more closely matches observations relieves thistension.

We measure the luminosity function directly using data inthe SDSS DR7 VAGC, using the same method outlined inReddick et al. (2013). To this measurement, we fit a modi-fied a double-Schechter function with a Gaussian at the brightend, as given by

Φ(M) = 0.4ln(10)e−10−0.4(M−M∗ )φ110−0.4(M−M∗)(α1+1)

+

0.4ln(10)e−10−0.4(M−M∗ )φ210−0.4(M−M∗)(α2+1)) +

φ3√2πσ2

hi

e−(M−Mhi )2

2σ2hi . (17)

At z = 0.05, we find that equation 17 with parameters listed intable 3 reproduces the observations extremely well. We alsoinclude evolution in this luminosity function with redshift byallowing for evolution in φi, M∗ and Mhi of the form:

M∗/hi(z) = M∗/hi,0 + Q(

11 + z

−1

1.1

), (18)

and

φi(z) = φi,0 + Pz. (19)

The value of P is taken from Cool et al. (2012), but Q is fitto match counts as a function of magnitude from DES Y1data. This evolution is constrained to be very small over theredshift range relevant to the current work. We refer readers

Table 3. Parameters of the SDSSDR7 r−band z = 0.05 luminosityfunction as defined by equation 17.

φ1 0.0156±0.03h−1Mpcφ2 0.00671±0.00029h−1Mpcα1 −0.166±0.041α2 −1.523±0.01M∗ −19.88±0.03 − 5 log(h)φ3 (3.08±3.24)×10−5h−1MpcMhi −21.72±0.52 − 5 log(h)σhi 0.484±0.192

to DeRose et al. (2019a) for more details related to how thisevolution is constrained.

With the luminosity function above, we use SHAM to pop-ulate all 100 snapshots of the T1 simulation. Our catalogsare complete down to roughly Mr − 5log(h) = −19 in the T1simulation and provide an excellent fit to the observed SDSSmagnitude dependent two-point correlation function as mea-sured in Reddick et al. (2013). A comparison of the SHAMalgorithm applied to the T1 simulation with SDSS data isshown in Fig. 3, and is described further in Sec. 4.

B. MODELING P(Rδ|MR,Z)

In order to determine Θ(x,z) (see eq. (7)), Rδ is mea-sured at the position of each galaxy in the SHAM catalogs.The function p̂(Rδ |Mr < xi,z j) is then determined, wherethe hat denotes that this is a measured quantity in the i-thmagnitude bin and j-th snapshot, using magnitude bins withwidth ∆Mr = 0.1 between −23 < xi < −18 in the 56 snap-shots with z j < 2.5. p̂(Rδ |Mr < xi,z j) is used rather thanp̂(Rδ |Mr = xi,z j), because the former quantity is significantlyless noisy for bright magnitudes and we have found that thisallows for more robust estimation of the parameters in Eq. 7.

Equation (7) is then fit to each magnitude cut, xi, and red-shift, z j. In practice, we do not fit for Θ, instead opting toperform the fit in a basis where the parameters have minimalcovariance. To achieve this, p̂(Rδ |Mr < xi,z j) is first fit usingthe original set of parameters, Θ, maximizing the likelihoodgiven by

L =N ( p̂(Rδ |Mr < xi,z j) − p(Rδ ;Θ(x,z)), Σ̂i, j) , (20)

where Σ̂i, j is the covariance matrix of p̂(Rδ |Mr < xi,z j) be-tween each Rδ bin. The covariance matrix is estimated viajackknife using 125 equal volume sub-regions of the T1 sim-ulation. This procedure allows the estimation of the param-eter covariance matrix, T̄ , as the mean of the parameter co-variance matrices for each redshift and magnitude bin, Ti, j.


The function p̂(Rδ |Mr < xi,z j) is then fit to each magnitudeand redshift bin again, this time performing the maximizationover the parameter space defined by

Θ′

= P̄Θ , (21)

where P̄ is the change of basis matrix that diagonalizes T̄ ,yielding a set of estimated parameters for each magnitudeand redshift bin, Θ

′

i, j. In order to smoothly interpolate be-tween these as a function of Mr and z, a Gaussian pro-cess is fit to the set of Θ

′

i, j. With the Gaussian processmodel, Θ̂

′(x,z), it is possible to predict p(Rδ |Mr < x,z) =

p(Rδ ;T −1Θ̂′(x,z)). Figure 10 shows the Gaussian process

fits to the parameters of this model, and shows the parametertrends with redshift and magnitude.

C. SAMPLING FROM P(Rδ|MR,Z)

Here we describe how we draw samples of densities, Rδ ,from p(Rδ|Mr,z), where Mr and z are the absolute magni-tude and redshift of a galaxy in our simulation. It is triv-ial to convert random samples from a uniform distributioninto samples from an arbitrary one-dimensional probabil-ity distribution function, using the cumulative distributionfunction (CDF) of the PDF, which can be obtained by nu-merically integrating the CDF. The difficulty in our case isthat we don’t have direct access to p(Rδ|Mr,z), but rather top(Rδ|M < Mr,z), since this quantity can be measured withsignificantly less noise in our simulations than p(Rδ|Mr,z).This is particularly true for the brightest galaxies since thereare few of these in the training simulation. Because we knowthe average luminosity function in our training simulation,we can convert p(Rδ|M <Mr,z) to p(Rδ|Mr,z) using:

p(Rδ|Mr) =1Z

[N(Mr + δMr)p(Rδ|Mr + δMr) (22)

− N(Mr)p(Rδ|Mr)]

=I(Rδ|Mr)

Z, (23)

where

Z =∫

dRδI(Rδ|Mr) . (24)

We have dropped the z argument to all functions for legibility,and where N(Mr) =

∫ Mr

−∞ dM′rφ(M′r) is the cumulative num-ber density of all galaxies brighter than Mr. In practice, weevaluate p(Rδ|Mr,z) on a grid of redshift and absolute mag-nitude using Eq. 22. For the ith galaxy, we choose the gridpoint nearest to Mr,i and zi and sample from the appropriatep(Rδ|Mr,z) to draw a density.

D. HALO OCCUPATION STATISTICS IN ADDGALS

Here we discuss the halo occupation statistics of the AD-DGALS method compared to the SHAM model, applied tothe T1 high-resolution training simulation. The top left panelof fig. 11 shows a comparison of the p(Mr,cen|Mhalo) as mea-sured in the z = 0 snapshot of the T1 SHAM catalog and therelation in an ADDGALS catalog run on the z = 0 snapshotof the L1 simulation. The agreement seen here is validationthat the model in eq. (1) describes this relation well. Thesmall discrepancies seen at the bright end can be explainedby differences in the assumed functional forms. The AD-DGALS model for Mr,cen(Mvir) is a broken power law, whichdoes not perfectly fit the relation measured in the SHAMmodel at the bright end. The luminosity function employedwhen creating the SHAM catalog assumes a Schechter func-tion plus a Gaussian component at the bright end, and theGaussian component leads to a deviation from a power lawfor Mr < −22. The functional form for Mr,cen(Mvir) used inADDGALS was derived in Vale & Ostriker (2004) assuminga pure Schechter function, and is fit well by a pure powerlaw for Mr < −22. This difference may give rise to slightdiscrepancies in the probability that the brightest galaxy in acluster mass halo is the central galaxy between the SHAMand ADDGALS models.

We compare the HODs and CLFs measured in the SHAMand ADDGALS catalogs. Since SHAM has been shownto provide a good match to the observed CLF (Reddicket al. 2013), this comparison tests the assumption that thep(Rδ |Mr,z) relation is sufficient to recover a range of prop-erties of the galaxy distribution and its relation to the un-derlying halos. In the top right panels of fig. 11, we com-pare the HOD as measured in the ADDGALS L1 catalogand the SHAM catalog. ADDGALS largely agrees withthe SHAM catalog, with some minor differences appearingaround 1013h−1M� , where ADDGALS over-predicts abun-dances of galaxies with respect to SHAM. The reason forthis is because at masses below the smoothing scale used tomeasure Rδ( i.e. 1.8×1013h−1M�) P(Rδ |Mr) becomes muchbroader, and thus is less able to disambiguate between halosof different masses. Due to the power-law halo mass func-tion at low mass, galaxies that should have been placed inlow-mass halos can then scatter into higher-mass halos, lead-ing to the excess seen in the ADDGALS HOD measurementscompared with the SHAM HODs.

The bottom left panel of fig. 11 shows a comparison of theconditional luminosity function (CLF) of galaxies in bins ofhalo mass. Again, ADDGALS and the SHAM catalog arelargely in agreement with each other, except for the lowestmass bin we consider, where a similar Eddington-like bias isat play. The bottom right panel of fig. 11 shows a compari-son of the fraction of galaxies that are satellites as a functionof magnitude in the same mass bins as those used to mea-sure the CLF. The satellite fraction for all galaxies in halos


−0.50.00.5

µc(M

r,z

)

Mr < −21.5 Mr < −20.8

0.5

1.0

σc(M

r,z

)

Mr < −20.2 Mr < −19.6

2

3

µf(M

r,z

)

Mr < −18.9

0.5

1.0

σf(M

r,z

)

0.0 0.5 1.0 1.5 2.0 2.5z

0.75

1.00

p(M

r,z

)

Figure 10. Parameters of the model for p(Rδ |Mr,z), σc, σ f , µc, µ f , and p, as a function of redshift and magnitude. Lines show the Gaussianprocess model for this redshift and magnitude dependence.

with Mvir > 5×1012h−1M� is included in black for reference.At bright magnitudes, ADDGALS slightly over-predicts thesatellite fraction in the lowest mass bin shown, and slightly

under-predicts the bright-end satellite fraction for more mas-sive halos, but these differences are small.

REFERENCES

Abbott, T. M. C., Abdalla, F. B., Alarcon, A., et al. 2018, PhRvD,

98, 043526

Abbott, T. M. C., Aguena, M., Alarcon, A., et al. 2020a, PhRvD,

102, 023509

—. 2020b, PhRvD, 102, 023509

Aihara, H., et al. 2011, ApJS, 193, 29

Anbajagane, D., Evrard, A. E., Farahi, A., et al. 2020, MNRAS,

495, 686

Annis, J., Soares-Santos, M., Strauss, M. A., et al. 2014, ApJ, 794,

120

Asorey, J., Carrasco Kind, M., Sevilla-Noarbe, I., Brunner, R. J., &

Thaler, J. 2016, MNRAS, 459, 1293

Avila, S., Murray, S. G., Knebe, A., et al. 2015, MNRAS, 450,

1856

Balaguera-Antolínez, A., Kitaura, F.-S., Pellejero-Ibáñez, M.,

Zhao, C., & Abel, T. 2019, MNRAS, 483, L58

Baxter, E., Chang, C., Jain, B., et al. 2017, ApJ, 841, 18

Becker, M. R. 2015, arXiv e-prints, arXiv:1507.03605

Becker, M. R., McKay, T. A., Koester, B., et al. 2007, ApJ, 669,

905

Becker, M. R., Troxel, M. A., MacCrann, N., et al. 2016, PhRvD,

94, 022002

Behroozi, P., Wechsler, R. H., Hearin, A. P., & Conroy, C. 2019,MNRAS, 488, 3143

Behroozi, P. S., Conroy, C., & Wechsler, R. H. 2010, ApJ, 717, 379

Behroozi, P. S., Wechsler, R. H., & Conroy, C. 2013a, ApJ, 770, 57

Behroozi, P. S., Wechsler, R. H., & Wu, H.-Y. 2013b, ApJ, 762,109

Behroozi, P. S., Wechsler, R. H., Wu, H.-Y., et al. 2013c, ApJ, 763,18

Benson, A. J. 2012, NewA, 17, 175

Benson, A. J., Frenk, C. S., Lacey, C. G., Baugh, C. M., & Cole, S.2002, MNRAS, 333, 177

Berger, P., & Stein, G. 2019, MNRAS, 482, 2861

Berlind, A. A., & Weinberg, D. H. 2002, ApJ, 575, 587

Bernardi, M., Meert, A., Sheth, R. K., et al. 2013, MNRAS, 436,697

Blanton, M. R., Brinkmann, J., Csabai, I., et al. 2003a, AJ, 125,2348

Blanton, M. R., Hogg, D. W., Bahcall, N. A., et al. 2003b, ApJ,592, 819

Blanton, M. R., Schlegel, D. J., Strauss, M. A., et al. 2005, AJ,129, 2562

Bleem, L. E., van Engelen, A., Holder, G. P., et al. 2012, ApJ, 753,L9


1013 1014 1015

Mvir [h−1M�]

−23.0

−22.5

−22.0

−21.5

−21.0

−20.5

Mr,cen

[h−

1 mag

]

sham

addgals

10−2

10−1

100

101

<N|M

vir>

Mr < −22

Addgals L1

SHAM

Mr < −21

1013 1014 1015

Mvir[h−1M�]

100

101

102

<N|M

vir>

Mr < −20

1013 1014 1015

Mvir[h−1M�]

Mr < −19

10−1

100

101

102

φ(M

r|Mvi

r)[h

3 Mp

c−3 ] < log10Mvir >= 12.9 < log10Mvir >= 13.6

−24−22−20Mr[h

−1 mag]

10−1

100

101

102

φ(M

r|Mvi

r)[h

3 Mp

c−3 ] < log10Mvir >= 14.4

−24−22−20Mr[h

−1 mag]

< log10Mvir >= 14.80.0

0.5

1.0

1.5

f sat(M

r|Mvi

r)

< log10Mvir >= 12.9 < log10Mvir >= 13.6

−22−20Mr[h

−1 mag]

0.0

0.5

1.0

1.5

f sat(M

r|Mvi

r)

< log10Mvir >= 14.4

AddgalsSHAM, log10Mvir > 12.8

SHAM

−22−20Mr[h

−1 mag]

< log10Mvir >= 14.8

Figure 11. Comparison of halo occupation statistics between the ADDGALS L1 model (blue) and the SHAM model is it tuned to (red; based onthe T1 simulation). Top left: Central galaxy r-band absolute magnitude as a function of host halo mass. Top right: Halo occupation distributionfor four magnitude-limited samples. Solid lines show total HODs, dashed lines show central HODs, and dash–dotted lines show satellite HODs.Error bars shown are the jackknife error bars for each catalog. Bottom left: Conditional Luminosity Function for four bins in halo mass (theaverage log(h−1M� ) in the bin is labeled in each panel). Dashed lines are central luminosity functions, dash–dotted lines are satellite luminosityfunctions and solid lines are the sum of the two. Error bars indicate the jackknife error bars for each catalog. Bottom right: Satellite fractionfor galaxies in halos in four mass bins as a function of r-band absolute magnitude. For reference, he black line shows the satellite fraction inthe T1 SHAM catalog for all halos with log10Mvir > 12.8.


Bleem, L. E., Stalder, B., de Haan, T., et al. 2015, TheAstrophysical Journal Supplement Series, 216, 27

Bond, J. R., & Myers, S. T. 1996, ApJS, 103, 1Bonnett, C., Troxel, M. A., Hartley, W., et al. 2016, PhRvD, 94,

042005Bower, R. G., Benson, A. J., Malbon, R., et al. 2006, MNRAS,

370, 645Bradshaw, A. K. 2019, in American Astronomical Society Meeting

Abstracts, Vol. 233, American Astronomical Society MeetingAbstracts #233, 376.01

Bryan, G. L., & Norman, M. L. 1998, ApJ, 495, 80Buchs, R., Davis, C., Gruen, D., et al. 2019, MNRAS, 489, 820Bullock, J. S., Wechsler, R. H., & Somerville, R. S. 2002,

MNRAS, 329, 246Cai, Y.-C., Angulo, R. E., Baugh, C. M., et al. 2009, MNRAS, 395,

1185Cawthon, R., Davis, C., Gatti, M., et al. 2018, MNRAS, 481, 2427Cawthon, R., Elvin-Poole, J., Porredon, A., et al. 2020, arXiv

e-prints, arXiv:2012.12826Chang, C., & Jain, B. 2014, MNRAS, 443, 102Chang, C., Vikram, V., Jain, B., et al. 2015, PhRvL, 115, 051301Chang, C., Pujol, A., Mawdsley, B., et al. 2018a, MNRAS, 475,

3165—. 2018b, MNRAS, 475, 3165Chaves-Montero, J., Angulo, R. E., Schaye, J., et al. 2016,

MNRAS, 460, 3100Chuang, C.-H., Kitaura, F.-S., Prada, F., Zhao, C., & Yepes, G.

2015, MNRAS, 446, 2621Clampitt, J., Sánchez, C., Kwan, J., et al. 2017, MNRAS, 465,

4204Cole, S., Lacey, C. G., Baugh, C. M., & Frenk, C. S. 2000,

MNRAS, 319, 168Conroy, C., Wechsler, R. H., & Kravtsov, A. V. 2006a, ApJ, 647,

201—. 2006b, ApJ, 647, 201Contreras, S., Angulo, R., & Zennaro, M. 2020, arXiv e-prints,

arXiv:2012.06596Cool, R. J., Eisenstein, D. J., Kochanek, C. S., et al. 2012, ApJ,

748, 10Cooray, A. 2006, MNRAS, 365, 842Crocce, M., Castander, F. J., Gaztañaga, E., Fosalba, P., &

Carretero, J. 2015, MNRAS, 453, 1513Crocce, M., Pueblas, S., & Scoccimarro, R. 2006, MNRAS, 373,

369Croton, D. J., Stevens, A. R. H., Tonini, C., et al. 2016, ApJS, 222,

22Cunha, C. E., Huterer, D., Busha, M. T., & Wechsler, R. H. 2012,

MNRAS, 423, 909Cunha, C. E., Huterer, D., Lin, H., Busha, M. T., & Wechsler,

R. H. 2014, MNRAS, 444, 129

Dai, B., & Seljak, U. 2021, Proceedings of the National Academyof Science, 118, 2020324118

Davies, L. J. M., Maraston, C., Thomas, D., et al. 2013, MNRAS,434, 296

Davis, C., Rozo, E., Roodman, A., et al. 2018, MNRAS, 477, 2196DeRose, J., Becker, M. R., & Wechsler, R. H. 2021a, in preparationDeRose, J., Wechsler, R. H., Becker, M. R., et al. 2019a, ApJ,

submittedDeRose, J., Wechsler, R. H., Becker, M. R., & the

DES Collaboration. 2021b, in preparationDeRose, J., Wechsler, R. H., Tinker, J. L., et al. 2019b, ApJ, 875,

69DES Collaboration, Abbott, T. M. C., Adamow, M., et al. 2021,

arXiv e-prints, arXiv:2101.05765Diemer, B., & Joyce, M. 2019, ApJ, 871, 168Dietrich, J. P., Zhang, Y., Song, J., et al. 2014, MNRAS, 443, 1713Dong, F., Pierpaoli, E., Gunn, J. E., & Wechsler, R. H. 2008, ApJ,

676, 868Eke, V. R., Frenk, C. S., Baugh, C. M., et al. 2004, MNRAS, 355,

769Erickson, B. M. S., Singh, R., & Evrard, A. E. 2012, in

Proceedings of the 1st Conference of the Extreme Science andEngineering Discovery Environment: Bridging from theeXtreme to the campus and beyond, XSEDE ’12 (New York,NY, USA: ACM), 34:1–34:8

Farahi, A., Evrard, A. E., Rozo, E., Rykoff, E. S., & Wechsler,R. H. 2016, MNRAS, 460, 3900

Feng, Y., Chu, M.-Y., Seljak, U., & McDonald, P. 2019, FastPM:Scaling N-body Particle Mesh solver, ascl:1905.010

Fosalba, P., Crocce, M., Gaztañaga, E., & Castander, F. J. 2015,MNRAS, 448, 2987

Friedrich, O., Gruen, D., DeRose, J., et al. 2018a, PhRvD, 98,023508

—. 2018b, PhRvD, 98, 023508Gatti, M., Vielzeuf, P., Davis, C., et al. 2018a, MNRAS, 477, 1664—. 2018b, MNRAS, 477, 1664Gatti, M., Giannini, G., Bernstein, G. M., et al. 2020, arXiv

e-prints, arXiv:2012.08569Gerdes, D. W., Sypniewski, A. J., McKay, T. A., et al. 2010, ApJ,

715, 823Gerke, B. F., Wechsler, R. H., Behroozi, P. S., et al. 2013, The

Astrophysical Journal Supplement Series, 208, 1Gill, M. S. S., Young, J. C., Draskovic, J. P., et al. 2009, arXiv

e-prints, arXiv:0909.3856Gruen, D., Friedrich, O., Krause, E., et al. 2018a, PhRvD, 98,

023507—. 2018b, PhRvD, 98, 023507Guo, Q., White, S., Angulo, R. E., et al. 2013, MNRAS, 428, 1351Hansen, S. M., Sheldon, E. S., Wechsler, R. H., & Koester, B. P.

2009a, ApJ, 699, 1333


—. 2009b, ApJ, 699, 1333Hao, J., McKay, T. A., Koester, B. P., et al. 2010, ApJS, 191, 254Harnois-Déraps, J., Amon, A., Choi, A., et al. 2018, MNRAS, 481,

1337Hearin, A., Korytov, D., Kovacs, E., et al. 2020, MNRAS, 495,

5040Hearin, A. P., & Watson, D. F. 2013, MNRAS, 435, 1313Hearin, A. P., Watson, D. F., Becker, M. R., et al. 2014a, MNRAS,

444, 729—. 2014b, MNRAS, 444, 729High, F. W., Hoekstra, H., Leethochawalit, N., et al. 2012, ApJ,

758, 68Hoyle, B., Gruen, D., Bernstein, G. M., et al. 2018, MNRAS, 478,

592Izard, A., Fosalba, P., & Crocce, M. 2018, MNRAS, 473, 3051Jarrett, T. H., Cohen, M., Masci, F., et al. 2011, ApJ, 735, 112Jarvis, M. J., Bonfield, D. G., Bruce, V. A., et al. 2013, MNRAS,

428, 1281Jiang, F., Dekel, A., Freundlich, J., et al. 2021, MNRAS, 502, 621Jiang, F., & van den Bosch, F. C. 2016, MNRAS, 458, 2848Jing, Y. P. 1998, ApJ, 503, L9Johnston, D. E., Sheldon, E. S., et al. 2007, in preparationKauffmann, G., White, S. D. M., & Guiderdoni, B. 1993, MNRAS,

264, 201Kitaura, F.-S., Rodríguez-Torres, S., Chuang, C.-H., et al. 2016,

MNRAS, 456, 4156Knebe, A., Knollmann, S. R., Muldrew, S. I., et al. 2011, MNRAS,

415, 2293Koester, B. P., McKay, T. A., Annis, J., et al. 2007a, ApJ, 660, 239—. 2007b, ApJ, 660, 221Korytov, D., Hearin, A., Kovacs, E., et al. 2019, ApJS, 245, 26Krause, E., Eifler, T. F., Zuntz, J., et al. 2017, arXiv e-prints,

arXiv:1706.09359Kravtsov, A. V., Berlind, A. A., Wechsler, R. H., et al. 2004, ApJ,

609, 35Kravtsov, A. V., Vikhlinin, A. A., & Meshcheryakov, A. V. 2018,

Astronomy Letters, 44, 8Landy, S. D., & Szalay, A. S. 1993, ApJ, 412, 64Lehmann, B. V., Mao, Y.-Y., Becker, M. R., Skillman, S. W., &

Wechsler, R. H. 2017, ApJ, 834, 37Leistedt, B., Peiris, H. V., Elsner, F., et al. 2016, The Astrophysical

Journal Supplement Series, 226, 24Lewis, A. 2004, PhRvD, 70, 043011MacCrann, N., DeRose, J., Wechsler, R. H., et al. 2018, MNRAS,

480, 4614Malz, A. I., Marshall, P. J., DeRose, J., et al. 2018, AJ, 156, 35Mandelbaum, R., Seljak, U., Kauffmann, G., Hirata, C. M., &

Brinkmann, J. 2006, MNRAS, 368, 715Manera, M., Scoccimarro, R., Percival, W. J., et al. 2013, MNRAS,

428, 1036

Mao, Y.-Y., Williamson, M., & Wechsler, R. H. 2015, ApJ, 810, 21

Mao, Y.-Y., Kovacs, E., Heitmann, K., et al. 2018, TheAstrophysical Journal Supplement Series, 234, 36

Martens, D., Fang, X., Troxel, M. A., et al. 2019, MNRAS, 485,211

Masaki, S., Lin, Y.-T., & Yoshida, N. 2013, MNRAS, 436, 2286

Massara, E., Ho, S., Hirata, C. M., et al. 2020, arXiv e-prints,arXiv:2010.00047

McMahon, R. G., Banerji, M., Gonzalez, E., et al. 2013, TheMessenger, 154, 35

Merson, A. I., Baugh, C. M., Helly, J. C., et al. 2013, MNRAS,429, 556

Miller, C. J., Nichol, R. C., Reichart, D., et al. 2005, AJ, 130, 968

Modi, C., Feng, Y., & Seljak, U. 2018, JCAP, 2018, 028

Monaco, P., Sefusatti, E., Borgani, S., et al. 2013, MNRAS, 433,2389

Moster, B. P., Macciò, A. V., Somerville, R. S., Naab, T., & Cox,T. J. 2011, MNRAS, 415, 3750

Moster, B. P., Naab, T., & White, S. D. M. 2018, MNRAS, 477,1822

Myles, J., Alarcon, A., Amon, A., et al. 2020, arXiv e-prints,arXiv:2012.08566

Myles, J., Gruen, D., Mantz, A. B., et al. 2021, MNRAS,arXiv:2011.07070

Navarro, J. F., Frenk, C. S., & White, S. D. M. 1996, ApJ, 462, 563

Nord, B., Amara, A., Réfrégier, A., et al. 2016, Astronomy andComputing, 15, 1

Pandey, S., Baxter, E. J., Xu, Z., et al. 2019, PhRvD, 100, 063519

Park, Y., Krause, E., Dodelson, S., et al. 2016, PhRvD, 94, 063533

Ramanah, D. K., Lavaux, G., Jasche, J., & Wandelt, B. D. 2019,A&A, 621, A69

Reddick, R. M., Wechsler, R. H., Tinker, J. L., & Behroozi, P. S.2013, ApJ, 771, 30

Rozo, E., Rykoff, E., Koester, B., et al. 2011, ApJ, 740, 53

Rozo, E., Wechsler, R. H., Koester, B. P., Evrard, A. E., & McKay,T. A. 2007a, ArXiv Astrophysics e-prints,arXiv:astro-ph/0703574

Rozo, E., Wechsler, R. H., Koester, B. P., et al. 2007b, ArXivAstrophysics e-prints, astro-ph/0703571

Rykoff, E. S., Rozo, E., & Keisler, R. 2015, arXiv e-prints,arXiv:1509.00870

Rykoff, E. S., Rozo, E., Busha, M. T., et al. 2014, ApJ, 785, 104

Safonova, S., Norberg, P., & Cole, S. 2021, MNRAS,arXiv:2009.00005

Saito, S., Leauthaud, A., Hearin, A. P., et al. 2016, MNRAS, 460,1457

Sánchez, C., Clampitt, J., Kovacs, A., et al. 2017, MNRAS, 465,746


Saunders, W., Smedley, S., Gillingham, P., et al. 2014, in Societyof Photo-Optical Instrumentation Engineers (SPIE) ConferenceSeries, Vol. 9150, Modeling, Systems Engineering, and ProjectManagement for Astronomy VI, 915023

Schmidt, S. J., Malz, A. I., Soo, J. Y. H., et al. 2020, MNRAS, 499,1587

Scoccimarro, R., & Sheth, R. K. 2002, MNRAS, 329, 629Seljak, U. 2000, MNRAS, 318, 203Sheldon, E. S., Johnston, D. E., Masjedi, M., et al. 2009, ApJ, 703,

2232Shin, T., Adhikari, S., Baxter, E. J., et al. 2019, MNRAS, 487, 2900Smith, A., Cole, S., Baugh, C., et al. 2017, MNRAS, 470, 4646Soares-Santos, M., de Carvalho, R. R., Annis, J., et al. 2011, ApJ,

727, 45Somerville, R. S., & Primack, J. R. 1999, MNRAS, 310, 1087Somerville, R. S., Olsen, C., Yung, L. Y. A., et al. 2021, MNRAS,

502, 4858Sousbie, T., Courtois, H., Bryan, G., & Devriendt, J. 2008, ApJ,

678, 569Springel, V., White, S. D. M., Jenkins, A., et al. 2005, Nature, 435,

629Stein, G., Alvarez, M. A., Bond, J. R., van Engelen, A., &

Battaglia, N. 2020, JCAP, 2020, 012Sutherland, W. 2012, in Science from the Next Generation Imaging

and Spectroscopic SurveysSzepietowski, R. M., Bacon, D. J., Dietrich, J. P., et al. 2014,

MNRAS, 440, 2191Tassev, S., Zaldarriaga, M., & Eisenstein, D. J. 2013, Journal of

Cosmology and Astro-Particle Physics, 2013, 036Tinker, J. L., Sheldon, E. S., Wechsler, R. H., et al. 2012, ApJ, 745,

16To, C.-H., Reddick, R. M., Rozo, E., Rykoff, E., & Wechsler, R. H.

2020, ApJ, 897, 15To, C.-H., Krause, E., Rozo, E., et al. 2021, MNRAS, 502, 4093Tröster, T., Ferguson, C., Harnois-Déraps, J., & McCarthy, I. G.

2019, MNRAS, 487, L24Troxel, M. A., MacCrann, N., Zuntz, J., et al. 2018, PhRvD, 98,

043528Vale, A., & Ostriker, J. P. 2004, MNRAS, 353, 189—. 2006, MNRAS, 371, 1173van den Bosch, F. C., & Ogiya, G. 2018, MNRAS, 475, 4066

van den Bosch, F. C., Ogiya, G., Hahn, O., & Burkert, A. 2018,MNRAS, 474, 3043

van den Bosch, F. C., Yang, X., Mo, H. J., et al. 2007, MNRAS,376, 841

VanderPlas, J. T., Connolly, A. J., Jain, B., & Jarvis, M. 2012, ApJ,744, 180

Varga, T. N., DeRose, J., Gruen, D., et al. 2019a, MNRAS, 489,2511

—. 2019b, MNRAS, 489, 2511

Vogelsberger, M., Marinacci, F., Torrey, P., & Puchwein, E. 2020,Nature Reviews Physics, 2, 42

Watson, D. F., Hearin, A. P., Berlind, A. A., et al. 2015, MNRAS,446, 651

Wechsler, R. H. 2004, in Clusters of Galaxies: Probes ofCosmological Structure and Galaxy Evolution, ed. J. S.Mulchaey, A. Dressler, & A. Oemler

Wechsler, R. H., & Tinker, J. L. 2018, Annual Review ofAstronomy and Astrophysics, 56, 435

Weinberg, D. H., Mortonson, M. J., Eisenstein, D. J., et al. 2013,PhR, 530, 87

Wetzel, A. R., & White, M. 2010, MNRAS, 403, 1072White, M., Tinker, J. L., & McBride, C. K. 2014, MNRAS, 437,

2594White, S. D. M., & Frenk, C. S. 1991, ApJ, 379, 52Yamamoto, M., Masaki, S., & Hikage, C. 2015, arXiv e-prints,

arXiv:1503.03973Yan, R., White, M., & Coil, A. L. 2004, ApJ, 607, 739Yang, S., Du, X., Benson, A. J., Pullen, A. R., & Peter, A. H. G.

2020, MNRAS, 498, 3902Yang, X., Mo, H. J., & van den Bosch, F. C. 2003a, MNRAS, 339,

1057—. 2003b, MNRAS, 339, 1057Zehavi, I., Zheng, Z., Weinberg, D. H., et al. 2011a, ApJ, 736, 59—. 2011b, ApJ, 736, 59Zhang, X., Wang, Y., Zhang, W., et al. 2019, arXiv e-prints,

arXiv:1902.05965Zheng, Z., Coil, A. L., & Zehavi, I. 2007, ApJ, 667, 760Zheng, Z., Berlind, A. A., Weinberg, D. H., et al. 2005, ApJ, 633,

791Zu, Y., & Mandelbaum, R. 2015, MNRAS, 454, 1161

arXiv:2105.12105v1 [astro-ph.CO] 25 May 2021

Documents

Transcript of arXiv:2105.12105v1 [astro-ph.CO] 25 May 2021