Computing High Accuracy Power Spectra with Pico

7
arXiv:0712.0194v1 [astro-ph] 2 Dec 2007 Draft version February 20, 2013 Preprint typeset using L A T E X style emulateapj v. 08/22/09 COMPUTING HIGH ACCURACY POWER SPECTRA WITH PICO William A. Fendt 1 and Benjamin D. Wandelt 1,2,3 Draft version February 20, 2013 ABSTRACT This paper presents the second release of Pico (Parameters for the Impatient COsmologist). Pico is a general purpose machine learning code which we have applied to computing the CMB power spectra and the WMAP likelihood. For this release, we have made improvements to the algorithm as well as the data sets used to train Pico, leading to a significant improvement in accuracy. For the 9 parameter nonflat case presented here Pico can on average compute the TT, TE and EE spectra to better than 1% of cosmic standard deviation for nearly all values over a large region of parameter space. Performing a cosmological parameter analysis of current CMB and large scale structure data, we show that these power spectra give very accurate 1 and 2 dimensional parameter posteriors. We have extended Pico to allow computation of the tensor power spectrum and the matter transfer function. Pico runs about 1500 times faster than CAMB at the default accuracy and about 250, 000 times faster at high accuracy. Training Pico can be done using massively parallel computing resources, including distributed computing projects such as Cosmology@Home. On the homepage for Pico, located at http://cosmos.astro.uiuc.edu/pico, we provide new sets of regression coefficients and make the training code available for public use. Subject headings: cosmic microwave background — cosmology: observations — methods: numerical 1. INTRODUCTION Given the quantity of data available from current ex- periments such as WMAP and SDSS as well as the prospects for the next generation Planck and DES ex- periments on the horizon, there is growing need for cos- mologists to develop tools that can accurately inter- pret the flood of data these experiments will gather. A key component in all such analysis is the exploration of the posterior density of the cosmological parameters given the available data. This allows us to constrain and test theoretical models of the Universe. A ma- jor computational hurdle in this procedure is the abil- ity to quickly and accurately compute the power spec- trum of CMB fluctuations and the matter transfer func- tion. This is accomplished using codes such as CMBFast (Seljak & Zaldarriaga 1996) and CAMB (Lewis et al. 2000) that evolve the Boltzmann equation for the vari- ous constituents of the Universe. Using the default accu- racy settings in CAMB the calculation of a single power spectrum takes on the order of a minute. At higher settings, as may be required by upcoming CMB experi- ments, the computational time can jump to several hours on a modern desktop. Furthermore, computation of con- straints based on the data requires evaluating the power spectrum at O ( 10 5 - 10 6 ) models. Decreasing the time required to calculate the CMB power spectrum, while maintaining sub-cosmic variance accuracy, will play an important part of turning raw data into quantitative in- formation about the history and structure of the Uni- verse. Previous codes aimed at speeding up power spec- 1 Department of Physics, UIUC, 1110 W Green Street, Urbana, IL 61801; [email protected] 2 Department of Astronomy, UIUC, 1002 W Green Street, Ur- bana, IL 61801; [email protected] 3 Center for Advanced Studies, UIUC, 912 W Illinois Street, Ur- bana, IL 61801 trum computations such as DASH (Kaplinghat et al. 2002) and CMBwarp (Jimenez et al. 2004) have at- tempted to reproduce the computation of CMBFast or CAMB. More recently, motivated to develop a code that was both faster than DASH and more accurate than CMBwarp, we released a machine learning code called Pico (Fendt & Wandelt 2007). It uses a training set of power spectra from CAMB to fit several multi- variate polynomials as a function of the input parame- ters. Along with accurately fitting the power spectra, we also found that Pico was able to directly fit the WMAP likelihood (Spergel et al. 2007; Hinshaw et al. 2007; Page et al. 2007). This was done previously by CMBFit (Sandvik et al. 2004) which used a similar idea of fitting the likelihood with a polynomial in the cosmo- logical parameters. By replacing CAMB and the WMAP likelihood code with Pico we demonstrated that it can quickly explore the parameter space and give nearly iden- tical posteriors. Since the first release of Pico, Auld et al. have applied a neural network code called CosmoNet to flat (Auld et al. 2006) and nonflat (Auld et al. 2007) models. Also Habib et al. have demonstrated that a Gaussian process model introduced in (Heitmann et al. 2006) can predict the CMB power spectrum for flat mod- els based on a very small training set (Habib et al. 2007). Here we discuss some improvements to increase the ac- curacy of Pico and extend its to application to more gen- eral cosmological models. Further, we describe a training method that avoids serial runs of CAMB and is designed to leverage access to massively parallel and even dis- tributed computing resources. For example, we demon- strate that Pico can be trained using the thousands of geographically distinct hosts that contribute to Cosmol- ogy@Home. 4 This paper is organized as follows. Section 2 summa- 4 http://www.cosmologyathome.org

Transcript of Computing High Accuracy Power Spectra with Pico

arX

iv:0

712.

0194

v1 [

astr

o-ph

] 2

Dec

200

7Draft version February 20, 2013Preprint typeset using LATEX style emulateapj v. 08/22/09

COMPUTING HIGH ACCURACY POWER SPECTRA WITH PICO

William A. Fendt1 and Benjamin D. Wandelt1,2,3

Draft version February 20, 2013

ABSTRACT

This paper presents the second release of Pico (Parameters for the Impatient COsmologist). Picois a general purpose machine learning code which we have applied to computing the CMB powerspectra and the WMAP likelihood. For this release, we have made improvements to the algorithm aswell as the data sets used to train Pico, leading to a significant improvement in accuracy. For the 9parameter nonflat case presented here Pico can on average compute the TT, TE and EE spectra tobetter than 1% of cosmic standard deviation for nearly all ℓ values over a large region of parameterspace. Performing a cosmological parameter analysis of current CMB and large scale structure data, weshow that these power spectra give very accurate 1 and 2 dimensional parameter posteriors. We haveextended Pico to allow computation of the tensor power spectrum and the matter transfer function.Pico runs about 1500 times faster than CAMB at the default accuracy and about 250, 000 times fasterat high accuracy. Training Pico can be done using massively parallel computing resources, includingdistributed computing projects such as Cosmology@Home. On the homepage for Pico, located athttp://cosmos.astro.uiuc.edu/pico, we provide new sets of regression coefficients and make thetraining code available for public use.

Subject headings: cosmic microwave background — cosmology: observations — methods: numerical

1. INTRODUCTION

Given the quantity of data available from current ex-periments such as WMAP and SDSS as well as theprospects for the next generation Planck and DES ex-periments on the horizon, there is growing need for cos-mologists to develop tools that can accurately inter-pret the flood of data these experiments will gather. Akey component in all such analysis is the explorationof the posterior density of the cosmological parametersgiven the available data. This allows us to constrainand test theoretical models of the Universe. A ma-jor computational hurdle in this procedure is the abil-ity to quickly and accurately compute the power spec-trum of CMB fluctuations and the matter transfer func-tion. This is accomplished using codes such as CMBFast(Seljak & Zaldarriaga 1996) and CAMB (Lewis et al.2000) that evolve the Boltzmann equation for the vari-ous constituents of the Universe. Using the default accu-racy settings in CAMB the calculation of a single powerspectrum takes on the order of a minute. At highersettings, as may be required by upcoming CMB experi-ments, the computational time can jump to several hourson a modern desktop. Furthermore, computation of con-straints based on the data requires evaluating the powerspectrum at O

(

105 − 106)

models. Decreasing the timerequired to calculate the CMB power spectrum, whilemaintaining sub-cosmic variance accuracy, will play animportant part of turning raw data into quantitative in-formation about the history and structure of the Uni-verse.

Previous codes aimed at speeding up power spec-

1 Department of Physics, UIUC, 1110 W Green Street, Urbana,IL 61801; [email protected]

2 Department of Astronomy, UIUC, 1002 W Green Street, Ur-bana, IL 61801; [email protected]

3 Center for Advanced Studies, UIUC, 912 W Illinois Street, Ur-bana, IL 61801

trum computations such as DASH (Kaplinghat et al.2002) and CMBwarp (Jimenez et al. 2004) have at-tempted to reproduce the computation of CMBFast orCAMB. More recently, motivated to develop a codethat was both faster than DASH and more accuratethan CMBwarp, we released a machine learning codecalled Pico (Fendt & Wandelt 2007). It uses a trainingset of power spectra from CAMB to fit several multi-variate polynomials as a function of the input parame-ters. Along with accurately fitting the power spectra,we also found that Pico was able to directly fit theWMAP likelihood (Spergel et al. 2007; Hinshaw et al.2007; Page et al. 2007). This was done previously byCMBFit (Sandvik et al. 2004) which used a similar ideaof fitting the likelihood with a polynomial in the cosmo-logical parameters. By replacing CAMB and the WMAPlikelihood code with Pico we demonstrated that it canquickly explore the parameter space and give nearly iden-tical posteriors. Since the first release of Pico, Auld etal. have applied a neural network code called CosmoNetto flat (Auld et al. 2006) and nonflat (Auld et al. 2007)models. Also Habib et al. have demonstrated that aGaussian process model introduced in (Heitmann et al.2006) can predict the CMB power spectrum for flat mod-els based on a very small training set (Habib et al. 2007).Here we discuss some improvements to increase the ac-curacy of Pico and extend its to application to more gen-eral cosmological models. Further, we describe a trainingmethod that avoids serial runs of CAMB and is designedto leverage access to massively parallel and even dis-tributed computing resources. For example, we demon-strate that Pico can be trained using the thousands ofgeographically distinct hosts that contribute to [email protected]

This paper is organized as follows. Section 2 summa-

4 http://www.cosmologyathome.org

2

rizes some of the improvements to the Pico algorithmand how training sets are generated. In section 3 wedemonstrate the performance of Pico in computing thepower spectrum, matter transfer function and WMAPlikelihood for a 9 parameter nonflat model. We showthat Pico can be used to quickly explore the parameterposterior for this model while accurately reproducing the1 and 2 dimensional marginalized distributions. Lastlywe summarize and conclude in section 4.

2. THE ALGORITHM

2.1. Overview

Given a training set of Cosmological parameters x andCMB power spectra y, Pico models the function y =f (x) in 3 steps. First it compresses the power spectrausing a Karhunen-Loeve (Karhunen 1946; Loeve 1955;Tegmark & Bunn 1995) technique. For ℓmax = 3000, thetemperature and polarization spectra due to scalar andtensor perturbations can be compressed to ∼ 180 totalnumbers. Next the algorithm clusters the input param-eters into non-overlapping regions. This is done usinga k-means algorithm (MacQueen 1967; Kirby 2001) orby choosing hyperplanes to manually partition the space.Lastly, Pico models the function by fitting a least-squarespolynomial to the compressed power spectra over eachcluster. Details of the algorithm can be found in theappendix of Fendt & Wandelt (2007).

2.2. Improved Fitting

The first change to the algorithm is that it now uses theLAPACK libraries 5 to perform matrix decomposition.LAPACK makes the algorithm more stable, allowing theuse of higher order polynomials. This gives Pico betterfits at low ℓ (< 200) and high ℓ (> 1500). This also makesclustering significantly less important but still useful forimproving the low ℓ accuracy as well as improving thefit to the WMAP likelihood directly. In particular, wehave found that partitioning the training set along valuesof constant Ωk improves the low ℓ accuracy in the tem-perature power spectrum and matter transfer function,while partitioning in τ , the reionization depth, gives alarge improvement in computing the polarization powerspectrum at low ℓ.

2.3. Decreasing Numerical Noise

For nonflat models, fitting algorithms are hindered bythe fact that at the default accuracy the power spec-tra computed by CAMB are numerically noisy. This isdemonstrated in Figure 1, where we have plotted thepower spectrum at the default and high accuracy lev-els for various ℓ-values as a function of a single param-eter (Ωbh2) for a nonflat model. Since the power spec-trum is not a numerically smooth function of the cos-mological parameters, Pico is limited in its ability to fitthe true spectra. Also note that the higher accuracypower spectrum does not always smooth over the loweraccuracy case. To reduce this noise and limit the ef-fect of interpolation errors we have generated trainingsets with CAMB using the accuracy parameters set as:Accuracy=3, lAccuracy=3 and lSamp=3. In Figure 2 we

5 http://www.netlib.org/lapack/

1390

1400

1410

1420

C50

Acc=1Acc=3

Pico

2380

2420

2460

2500

C50

0

960

965

970

975

980

985

0.021 0.022 0.023

C10

00

Ωbh2 0.021 0.022 0.023

650

656

662

668

C15

00

Ωbh2

Fig. 1.— The plot shows the value of the temperature spectrumas a function of Ωbh2 at various ℓ-values for a nonflat cosmology.The red (+) points correspond to to the default CAMB accuracies(1,1,1) and the green () points correspond to higher accuracysettings (3,3,1). While at low ℓ the power spectrum is smooth, thedefault accuracy becomes numerically noisy at higher ℓ. This isone reason adding Ωk as a free parameter increases the difficultyin fitting the power spectrum. Also plotted, as a blue line, isthe power spectrum computed by Pico trained on the 9 parameternonflat model discussed in section 3.

-1

-0.5

0

0.5

1

0 500 1000 1500 2000 2500

% E

rror

CAMB Acc=1Pico

-1

-0.5

0

0.5

1

0 500 1000 1500 2000 2500

% E

rror

CAMB Acc=1Pico

Fig. 2.— The plots show the percent error between the TT (left)and EE (right) power spectrum computed by CAMB at the defaultaccuracy compared to those computed at high accuracy . Alsoshown is the percent error between the power spectra computedby Pico and the high accuracy CAMB spectra. This test was doneon 25 models all located within 25 log-likelihoods of the WMAPpeak.

have plotted the error between the default and high ac-curacy power spectra from CAMB for 25 models aroundthe peak of the WMAP likelihood. The left and rightplots show the percent error in the TT and EE spectra.Also shown is the error between the spectra computed byPico and the high accuracy CAMB runs. Here Pico wastrained on the 9 parameter model discussed in section 3.The figures demonstrate that when trained on high accu-racy data Pico computes the power spectrum around thepeak of the likelihood to better precision than CAMB atits default accuracy settings.

Numerical noise in the power spectrum also leads tonoise in the WMAP likelihood as shown in Figure 3.Again, this increases the difficulty in fitting the likeli-hood with Pico. Just as the power spectrum, this isremedied by running CAMB at higher accuracy. Whilethe level of noise introduced in the likelihood may not beof significant concern when analyzing current CMB datasets, it may represent an important hurdle to overcomefor the next generation of experiments. Also evident fromFigure 3 is that Pico provides a smooth approximationto the noisy likelihood. This is an important property

3

1769

1770

1771

1772

1773

1774

1775

1776

0.0205 0.021 0.0215 0.022 0.0225 0.023

-Ln

L WM

AP

Ωbh2

Accuracy = 1Accuracy = 3

Pico

Fig. 3.— Value of − lnLWMAP as a function of Ωbh2 for a non-flat cosmology. Note that this is near the peak of the likelihoodin the full space. The red (+) points correspond to the defaultCAMB accuracies (1,1,1) and the green () points correspondingto using higher accuracy settings (3,3,1). The blue line is the valuecomputed by Pico trained over the 9 parameter nonflat models dis-cussed in section 3. Note that using the default accuracy in CAMBgives a numerically noisy function, which can lead to variations of 1or more log-likelihoods, but Pico gives a smooth function throughthe high accuracy values.

for algorithms that require differentiating the likelihoodsuch as Hamiltonian Monte Carlo (Hajian 2007).

2.4. Generating the Training Set

As in the first release of Pico, we generate the train-ing set of power spectra and matter transfer function bysampling uniformly from a large box. After training Picoto compute the power spectra, evaluation of the WMAPlikelihood, which requires a few seconds, becomes thenew bottleneck in parameter estimation. Another signif-icant speed up can be obtained by using Pico to directlyfit the likelihood function. However, for the 9 parame-ter case we examine in the next section this is a difficultproblem. Training Pico over a box in parameter space in-cludes regions that are many thousands of log-likelihoodsfrom the peak and gives a very sparse sampling of thehigh likelihood region. In practice we are not interestedin these areas of parameter space. Instead we aim tocompute the likelihood very accurately around the peakof the distribution. This requires generating a trainingset for Pico that includes only the high likelihood region.

A natural method of accomplishing this is to use theMetropolis Hastings algorithm to find points in the highlikelihood region. This can be done efficiently by runningCosmoMC Lewis & Bridle (2002) and using Pico to com-pute the power spectra. To ensure we cover a sufficientvolume the chains are run using only the WMAP dataand at a higher temperature, meaning the log-likelihoodis scaled by a constant factor allowing the chains to ex-plore a larger volume. This step is dominated by thetime it takes to run the WMAP likelihood code. Lastlywe run the samples through CAMB and the WMAP codeto get the true likelihood which will be used to retrainPico. This step is also quick as it can easily be run inparallel. It is useful to note that this procedure neverrequires running CAMB in serial making it an ideal ap-plication for distributed computing projects such as Cos-mology@Home. The training set can be further refined

by pruning out data at low likelihood. In the followingPico is trained to compute the likelihood on points within25 log-likelihoods of the WMAP peak.

2.5. Polynomial Hierarchy

The process of generating the training set outlined insection 2.4 has the added benefit of giving us a set ofpower spectra constrained around the peak of the like-lihood. We would like to make use of these points byadding them to the power spectra training set. Howeveradding a large weighting of points to a small region of thebox will have a negative effet on the accuracy of the algo-rithm outside this region. Instead we have implementedthe ability to use a hierarchy of polynomials with Pico byseparately training over the uniformly sampled points inthe full box and over only the points in the constrainedregion. If Pico is given a set of input cosmological pa-rameters within this region it computes the power spec-tra based on a polynomial fit to this constrained region.For points outside this region, Pico defaults to using thepolynomials fit over the full box. While we have foundthat using only a single set of polynomials trained on thebox is sufficient for analysis of current experimental data,this will be a useful feature in the future when data fromhigher resolution experiments become available.

3. RESULTS

Here we demonstrate the performance of Pico for non-flat cosmologies with the dark energy equation of state,wDE allowed to vary (but still constant for a givenmodel). In this space Pico fits the power spectrum andlikelihood as a function of

(

Ωbh2, Ωcdmh2, Ωk, θ, τ, ns, ln 1010As, r, wDE

)

.

The following sections study the accuracy of Pico incomputing the power spectra, matter transfer function,WMAP likelihood as well as its application to parameterestimation based on this 9 parameter model.

3.1. Power Spectra and Matter Transfer Function

In order to demonstrate Pico’s accuracy and robust-ness we will test the algorithm for two cases. The firstcase implements the hierarchy method discussed in sec-tion 2.5. Here the training set is divided into two pieces.The first contains ∼ 18000 samples generated uniformlyfrom the box defined in Table 1, and the second set con-sists of the 15000 points constrained to 25 log-likelihoodsfrom the peak of the WMAP likelihood. For this case thetest set consists of ∼ 2000 points taken from the lattertraining set. These points were removed from the train-ing set and not used to train Pico.

The models in the training set were run throughCAMB at accuracy settings (3,3,3) to compute the truepower spectra and transfer functions. As the ℓ and ksampling used by CAMB is model dependent it is nec-essary to spline the power spectra and transfer functionso that each is computed at the same ℓ or k value. Theℓ-values were chosen to be those used by CAMB for flatmodels with lSamp= 3. For the transfer function weused a unform sampling in ln k. Pico was trained using6th order polynomials, requiring less than 30 minutes ona 2.4GHz desktop.

4

Pico’s performance on this test set is shown in Figure4. The top 2 rows show the TT, TE and EE power spec-tra with the second row focusing on low ℓ. Results forthe BB spectra and matter transfer function are shownin the third row. The two lines in each plot represent themean error and the error bar that bounds 99% of the testset. For the power spectra the error is plotted in units ofthe cosmic standard deviation and for the matter trans-fer function the y-axis shows percent error. From thefigure we see that over the volume of parameter spaceimportant for CMB parameter estimation Pico can com-pute the power spectra for 99% of models in the trainingset to better than 4% of cosmic standard deviation, withthe mean error around 0.5%, over most ℓ-values. Eventhe worst fits to the EE power spectra, which occur justafter the reionization bump, are only about 25% of thecosmic standard deviation. We also note that many ofthe models that are fit poorly have very low power inthis region so no experiment should be sensitive to theseerrors. For the transfer function the 99% error bar isaround 0.25%, and the mean at 0.02%, except at verylow k. This should be sufficient for analysis of data fromthe next generation of large scale structure experiments.

0.018 < Ωbh2 < 0.0340.06 < Ωcdmh2 < 0.2−0.3 < Ωk < 0.31.02 < 100 θ < 1.080.01 < τ < 0.550.85 < ns < 1.252.75 < ln

`

1010As

´

< 4.00 < r < 2

−1.5 < wDE < −0.3

TABLE 1Parameter bounds defining the box the training set was

sampled from for the example in section 3. Thisencompasses a volume of at least 3σ in each parameteraround the WMAP maximum likelihood. Note that wealso impose the prior that the corresponding Hubbleconstant for each parameter point lie in the interval[30, 100] which excludes some regions inside the box.

For the second test case the hierarchy method is notused and Pico is only trained on a uniform sample ofpoints from the box in Table 1. For this case the test setconsists of a uniform sample of ∼ 2000 points from thesame box. Pico’s performance on this test set is shown inFigure 5. We include this case only to allow comparisonwith other codes. When Pico is used to explore the pa-rameter posterior based on CMB constraints, which is itsmain purpose, chains will rarely propose points outsidethe constrained volume used in the hierarchy method.Therefore Figure 4 provides a better indicator of thetypes or error incurred by using Pico to compute thepower spectra. The regression files on the Pico websiteimplement the hierarchy method.

3.2. WMAP Likelihood

Next we test the computation of the WMAP likeli-hood from Pico for two cases. The first uses Pico tocompute the power spectrum and then the WMAP codeto compute the likelihood and the second uses Pico todirectly compute the likelihood. The training set for the

0

0.01

0.02

0.03

0.04

0.05

0.06

0 500 1000 1500 2000 2500

Err

or in

ClT

T

multipole l

Mean99%

0

0.01

0.02

0.03

0.04

0.05

0.06

0 500 1000 1500 2000 2500

Err

or in

ClT

E

multipole l

Mean99%

0

0.01

0.02

0.03

0.04

0.05

0.06

0 500 1000 1500 2000 2500

Err

or in

ClE

E

multipole l

Mean99%

0

0.005

0.01

0.015

0.02

40 80 120 160 200

Err

or in

ClT

T

multipole l

Mean99%

0

0.02

0.04

0.06

0.08

0.1

40 80 120 160 200

Err

or in

ClT

E

multipole l

Mean99%

0

0.05

0.1

0.15

0.2

0.25

0.3

40 80 120 160 200

Err

or in

ClE

E

multipole l

Mean99%

0

0.01

0.02

0.03

0.04

50 100 150 200 250 300 350 400

Err

or in

ClB

B

multipole l

Mean99%

0

0.5

1

1.5

2

2.5

1e-04 0.001 0.01 0.1 1

% E

rror

in T

(k)

k

Mean99%

Fig. 4.— The above plots compare the performance of Pico withCAMB at high accuracy settings for 9 parameter nonflat modelswith wDE 6= 1. Pico was trained using the hierarchy method de-scribed in section 2.5 and the test set consists of 2000 points within25 log-likelihoods of the WMAP peak. The top two rows show theerror compared with CAMB in units of cosmic standard deviationfor the TT, TE and EE power spectra at high ℓ (top) and low ℓ(center). The bottom row shows the error in the BB spectra inunits of the cosmic standard deviation and the percent error in thematter transfer function. The two lines on each plot denote themean error and the error bar that bounds 99% of the test set. Wenote that much of the error at low ℓ in the EE spectra is due tothe 1% of models with extremely low power over this range. Thesespectra are too small to detect even with Planck.

0

0.01

0.02

0.03

0.04

0.05

0.06

0 500 1000 1500 2000 2500

Err

or in

ClT

T

multipole l

Mean99%

0

0.01

0.02

0.03

0.04

0.05

0.06

0 500 1000 1500 2000 2500

Err

or in

ClT

E

multipole l

Mean99%

0

0.01

0.02

0.03

0.04

0.05

0.06

0 500 1000 1500 2000 2500

Err

or in

ClE

E

multipole l

Mean99%

0

0.005

0.01

0.015

0.02

0.025

0.03

40 80 120 160 200

Err

or in

ClT

T

multipole l

Mean99%

0

0.05

0.1

0.15

0.2

40 80 120 160 200

Err

or in

ClT

E

multipole l

Mean99%

0

0.2

0.4

0.6

0.8

40 80 120 160 200

Err

or in

ClE

E

multipole l

Mean99%

0

0.02

0.04

0.06

0.08

0.1

50 100 150 200 250 300 350 400

Err

or in

ClB

B

multipole l

Mean99%

0

0.5

1

1.5

2

2.5

3

1e-04 0.001 0.01 0.1 1

% E

rror

in T

(k)

k

Mean99%

Fig. 5.— The plots are same as those in Figure 4 except herePico was trained and tested over models sampled uniformly fromthe box defined by Table 1. Even over this larger region Pico cancompute the power spectrum in 99% of the test cases to better than5% of cosmic standard deviation over most ℓ and is never worsethan 0.7 cosmic standard deviation.

likelihood computation consists of ∼ 15000 points gener-ated using the method described in section 2.4. Another∼ 2000 points, generated using the same method, wereused as a test set. The absolute error between the like-lihood is shown in Figure 6. The plot on the left showsthe results of using Pico to compute the power spectrumwhile the plot on the right shows the results of directlycomputing the likelihood. For the case of directly eval-uating the likelihood, Pico can compute about 90% ofthe test set better than 0.25 log-likelihoods. When onlyusing Pico to compute the power spectrum the resultsare within 0.25 log-likelihoods for better than 99.5% of

5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

1770 1775 1780 1785 1790

∆Ln

L

-ln LWMAP

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

1770 1775 1780 1785 1790

∆Ln

L

-ln LWMAP

Fig. 6.— The plots show the absolute error when evaluatingthe WMAP likelihood with Pico. In the left plot Pico was usedto compute the power spectrum which were fed into the WMAPcode. The right plot shows the absolute error when Pico directlyevaluates the likelihood. In both cases the likelihood is comparedto the value of the WMAP code using high accuracy CAMB powerspectra.

the models. The training set and test set were computedusing high accuracy CAMB runs and version v2p2p2 ofthe WMAP likelihood code.6

3.3. Parameter Estimation

To test the application of Pico to this 9 param-eter model we ran Markov chains using CosmoMCwith the WMAP(Hinshaw et al. 2007; Page et al.2007), ACBAR(Kuo et al. 2004), CBI(Readhead et al.2004) Boomerang(Montroy et al. 2006; Jones et al.2006; Piacentini et al. 2006), 2df(Cole et al. 2005),SDSS (Adelman-McCarthy et al. 2006), and SNLS(Astier et al. 2005) data sets. Chains were run for 3cases. The first uses CAMB and the WMAP likeli-hood code (CAMB+WMAP), the second uses Pico tocompute the power spectra and transfer function butstill uses the official likelihood codes (PICO+WMAP)and the third case uses Pico to compute the powerspectra, transfer function and the WMAP likelihood(PICO). In third case we did not use Pico to fit the 2df,SDSS or SNLS likelihood codes. The 1-dimensional,marginalized posteriors for each of the 9 parametersare shown along the diagonal in Figure 7. The plotsin the lower (upper) triangle in the figure compare the2 dimensional posteriors between the PICO+WMAP(PICO) case and the CAMB+WMAP case. In all ofthe plots the CAMB+WMAP results are shown inred, the PICO+WMAP results in green and the PICOresults in red. The lines in the 2D plots denote the68% and 99% contours. Using 6 chains run in parallel,the PICO+WMAP chains ran about 60 times fasterthan without Pico, requiring about 4 hours of wall clocktime. Using Pico to compute the WMAP likelihoodgave another factor of 2.5 decrease in CPU time givinga total speed up of ∼ 150 over chains run without Pico.In all cases the chains finished with a Gelman-Rubinstatistic less than 1.01.

4. CONCLUSION

This paper describes a major new release of Pico, afast and accurate code for computing the CMB powerspectrum, matter transfer function and the WMAP like-lihood. We noted the presence of numerical noise inCAMB at standard accuracy for nonflat models and itseffect on the WMAP likelihood. To solve this problemwe have generated training sets running CAMB at highaccuracy settings. Also we have presented a method of

6 http://lambda.gsfc.nasa.gov

generating a training set that finds the high likelihoodregion of parameter space without ever running CAMBin serial. This is especially useful for training Pico to fitthe WMAP likelihood in large dimensional spaces. Fur-thermore, Pico can be trained separately on the powerspectra in this smaller region of parameter space allowingeven more accurate results around the peak of the like-lihood while still maintaining the ability to compute thepower spectra over a large box in parameter space. Thecombination of these improvements, along with modifica-tions to the Pico algorithm, have increased its accuracyin computing the power spectrum and likelihood. Alsowe have extended Pico to compute the power spectrumdue to tensor perturbations as well as the matter powerspectrum. On the Pico homepage,7 we provide the newversion of Pico and new sets of regression coefficients.We have also released the training code for Pico, allow-ing users to apply the algorithm to new classes of modelsand parameter sets. We expect that the accuracy andspeed achieved by Pico will be useful for current andfuture CMB and large scale structure observations. Fur-thermore, we hope that the concept, embodied by Pico,of exploiting massively parallel computing resources tosolve inherently serial numerical problems will find ap-plications beyond the immediate domain of cosmologicalparameter estimation.

This work was partially funded by NSF grantsAST 05-07676 and AST 07-08849, by NASA contractJPL1236748, by the National Computational Science Al-liance under AST300029N, by the University of Illinois,by the Computational Science and Engineering Depart-ment at the University of Illinois and by a Friedrich Wil-helm Bessel research prize from the Alexander von Hum-boldt foundation. We utilized the Teragrid(Catlett et al.2007) Itanium 2 clusters at NCSA and at Argonne Na-tional Laboratory, as well as the Turing cluster in theComputational Science and Engineering Department atthe University of Illinois at Urbana-Champaign.

We thank the Max Planck Institute for Astrophysicsfor its hospitality while part of this work done. We alsothank the users of the Cosmology@Home8 project whosedonated CPU hours helped make this work possible.9

In particular we would like to thank Scott Kruger, theadministrator of Cosmology@Home, as well as the userslaurenu2, PoorBoy, [BˆS]ralfi65, Mitchell, and Mike TheGreat as representatives of all Cosmology@Home partic-ipants. Lastly we would like to thank Nikita Sorokin forhis work in designing the Pico homepage.

Funding for the Sloan Digital Sky Survey (SDSS)has been provided by the Alfred P. Sloan Foundation,the Participating Institutions, the National Aeronauticsand Space Administration, the National Science Foun-dation, the U.S. Department of Energy, the JapaneseMonbukagakusho, and the Max Planck Society. TheSDSS Web site is http://www.sdss.org/. The SDSSis managed by the Astrophysical Research Consortium(ARC) for the Participating Institutions. The Participat-ing Institutions are The University of Chicago, Fermilab,the Institute for Advanced Study, the Japan Participa-

7 http://cosmos.astro.uiuc.edu/pico8 http://www.cosmologyathome.org9 http://www.cosmologyathome.org/top users.php

6

Fig. 7.— The cosmological parameter posteriors using CAMB and Pico for 9 parameter nonflat models with wDE 6= −1 based on theWMAP, ACBAR, CBI, Boomerang, SDSS, 2df and SNLS data sets. The red lines are the result of using CAMB and the WMAP likelihoodcode. The green lines use Pico to compute the power spectrum but still uses the WMAP likelihood. The blue lines are the result fromusing Pico to compute the power spectrum and WMAP likelihood. The plots in the lower triangle show the 68% and 99% contours for thechains run using CAMB and PICO with the WMAP likelihood code. The upper triangle compares the CAMB chains to those using Picoto compute the power spectra and WMAP likelihood.

7

tion Group, The Johns Hopkins University, the KoreanScientist Group, Los Alamos National Laboratory, theMax-Planck-Institute for Astronomy (MPIA), the Max-Planck-Institute for Astrophysics (MPA), New Mexico

State University, University of Pittsburgh, Universityof Portsmouth, Princeton University, the United StatesNaval Observatory, and the University of Washington.

REFERENCES

J. K. Adelman-McCarthy et al. [SDSS Collaboration], Astrophys.J. Suppl. 162, 38 (2006)

P. Astier et al. [The SNLS Collaboration], Astron. Astrophys.447, 31 (2006)

T. Auld, M. Bridges, M. P. Hobson and S. F. Gull, Mon. Not.Roy. Astron. Soc. Lett. 376, L11 (2007)

T. Auld, M. Bridges and M. P. Hobson, arXiv:astro-ph/0703445.C. Catlett et al., HPC and Grids in Action, Ed. Lucio

Grandinetti, IOS Press ’Advances in Parallel Computing’series, Amsterdam (2007)

S. Cole et al. [The 2dFGRS Collaboration], Mon. Not. Roy.Astron. Soc. 362, 505 (2005)

W. A. Fendt and B. D. Wandelt, Astrophys. J. 654, 2 (2007)S. Habib, K. Heitmann, D. Higdon, C. Nakhleh and B. Williams,

arXiv:astro-ph/0702348.A. Hajian, Phys. Rev. D 75, 083525 (2007)K. Heitmann, D. Higdon, C. Nakhleh and S. Habib, Astrophys. J.

646, L1 (2006)G. Hinshaw et al. [WMAP Collaboration], Astrophys. J. Suppl.

170, 288 (2007)R. Jimenez, L. Verde, H. Peiris and A. Kosowsky, Phys. Rev. D

70, 023005 (2004)M. Kaplinghat, L. Knox and C. Skordis, Astrophys. J. 578, 665

(2002)Karhunen, K., Ann, Acad. Sci. Fennicae, 37 (1946)Kirby, M. 2001, Geometric Data Analysis: An Empirical

Approach to Dimensionality Reduction and the Study ofPatterns (New York: John Wiley & Sons)

C. l. Kuo et al. [ACBAR collaboration], Astrophys. J. 600, 32(2004)

W. C. Jones et al., Astrophys. J. 647, 823 (2006)A. Lewis, A. Challinor and A. Lasenby, Astrophys. J. 538, 473

(2000)A. Lewis and S. Bridle, Phys. Rev. D 66, 103511 (2002)Loeve, M. 1955, Probability Theory (Princeton: Van Nostrand)MacQueen, J. 1967, Proc. 5th Berkeley Symp. on Mathematical

Statistics and Probability, 1, 281T. E. Montroy et al., Astrophys. J. 647, 813 (2006)

[arXiv:astro-ph/0507514].L. Page et al. [WMAP Collaboration], Astrophys. J. Suppl. 170,

335 (2007)F. Piacentini et al., Astrophys. J. 647, 833 (2006)

[arXiv:astro-ph/0507507].A. C. S. Readhead et al., Astrophys. J. 609, 498 (2004)

[arXiv:astro-ph/0402359].H. B. Sandvik, M. Tegmark, X. M. Wang and M. Zaldarriaga,

Phys. Rev. D 69, 063005 (2004)U. Seljak and M. Zaldarriaga, Astrophys. J. 469, 437 (1996)D. N. Spergel et al. [WMAP Collaboration], Astrophys. J. Suppl.

170, 377 (2007)M. Tegmark and E. F. Bunn, Astrophys. J. 455, 1 (1995)