Do weak AR4 models bias projections of future climate changes over Australia?

32
Climatic Change (2009) 93:527–558 DOI 10.1007/s10584-008-9502-1 Do weak AR4 models bias projections of future climate changes over Australia? S. E. Perkins · A. J. Pitman Received: 14 May 2007 / Accepted: 2 September 2008 / Published online: 5 November 2008 © Springer Science + Business Media B.V. 2008 Abstract Regional climate projections using climate models commonly use an “all- model” ensemble based on data sets such as the Intergovernmental Panel on Climate Change’s (IPCC) 4th Assessment (AR4). Some regional assessments have omitted models based on specific criteria. We use a criteria based on the capacity of climate models to simulate the observed probability density function calculated using daily data, model-by-model and region-by-region for each of the AR4 models over Australia. We demonstrate that by omitting those climate models with relatively weak skill in simulating the observed probability density functions of maximum and minimum temperature and precipitation, different regional projections are obtained. Differences include: larger increases in the mean maximum and mean minimum temperatures, but smaller increases in the annual maximum and minimum temper- atures. There is little impact on mean precipitation but the better models simulate a larger increase in the annual rainfall event combined with a larger decrease in the number of rain days. The weaker models bias the amount of mean warming towards lower increases, bias annual maximum temperatures to excessive warming and bias precipitation such that the amount of the annual rainfall event is under-estimated. We suggest that omitting weak models from regional scale estimates of future climate change helps clarify the nature and scale of the projected impacts of global warming. S. E. Perkins · A. J. Pitman Climate Change Research Centre, University of New South Wales, Sydney, Australia A. J. Pitman (B ) Climate Change Research Centre, The University of New South Wales, Red Centre Building, Sydney, NSW 2052, Australia e-mail: [email protected]

Transcript of Do weak AR4 models bias projections of future climate changes over Australia?

Climatic Change (2009) 93:527–558DOI 10.1007/s10584-008-9502-1

Do weak AR4 models bias projectionsof future climate changes over Australia?

S. E. Perkins · A. J. Pitman

Received: 14 May 2007 / Accepted: 2 September 2008 / Published online: 5 November 2008© Springer Science + Business Media B.V. 2008

Abstract Regional climate projections using climate models commonly use an “all-model” ensemble based on data sets such as the Intergovernmental Panel onClimate Change’s (IPCC) 4th Assessment (AR4). Some regional assessments haveomitted models based on specific criteria. We use a criteria based on the capacityof climate models to simulate the observed probability density function calculatedusing daily data, model-by-model and region-by-region for each of the AR4 modelsover Australia. We demonstrate that by omitting those climate models with relativelyweak skill in simulating the observed probability density functions of maximum andminimum temperature and precipitation, different regional projections are obtained.Differences include: larger increases in the mean maximum and mean minimumtemperatures, but smaller increases in the annual maximum and minimum temper-atures. There is little impact on mean precipitation but the better models simulatea larger increase in the annual rainfall event combined with a larger decrease in thenumber of rain days. The weaker models bias the amount of mean warming towardslower increases, bias annual maximum temperatures to excessive warming and biasprecipitation such that the amount of the annual rainfall event is under-estimated.We suggest that omitting weak models from regional scale estimates of futureclimate change helps clarify the nature and scale of the projected impacts of globalwarming.

S. E. Perkins · A. J. PitmanClimate Change Research Centre,University of New South Wales, Sydney, Australia

A. J. Pitman (B)Climate Change Research Centre,The University of New South Wales, Red Centre Building,Sydney, NSW 2052, Australiae-mail: [email protected]

528 Climatic Change (2009) 93:527–558

1 Introduction

Projecting the impact of increasing greenhouse gases on the global, continental andregional scale is a major scientific challenge. At the heart of climate projectionsare coupled climate models, the tools that underpins the recent 4th AssessmentReport (AR4, Solomon et al. 2007) by the Intergovernmental Panel on ClimateChange (IPCC). Climate models are based on well established physical principlesand reproduce most significant features of the observed climate (Randall et al. 2007).Randall et al. (2007) conclude that there is now considerable confidence that coupledclimate models provide credible quantitative estimates of future climate changeparticularly at continental scales and above. They note that confidence is higher forsome climate variables (e.g., temperature) than for others (e.g., precipitation).

For impacts assessment, continental-scale projections are not particularly practicalfor planning, for providing specific local advice for adaptation or for gauging theimpact of climate changes on most biophysical, human or economic systems. Onereason why climate models are deemed “credible” at “continental scales and above”and therefore, at least by implication, less credible at sub-continental scales is that atregional scales climate models are quite variable in how they respond to increasinggreenhouse gas concentrations. However, while an individual climate model may be“known” to be unreliable, omitting specific climate models from an assessment likeAR4 is very difficult due to the lack of a community-agreed metric. Many differentmetrics that quantify climate model skill have been developed (e.g. Watterson 1996;Taylor 2001; Knutti et al. 2006; Piani et al. 2005; Shukla et al. 2006; Johns et al. 2006)but tend, when implemented, to use monthly to annual timescale data, sometimesover ensemble means of climate models.

Perkins et al. (2007) introduced one metric that assessed climate models bycomparing the observed and modeled distributions using daily data. Probabilitydensity functions (PDFs) were calculated for each observed and modeled datasetto calculate the probability of each event in the distribution occurring, not just at apriori points such as the mean. The metric then compares the observed and simulatedprobabilities at each magnitude to give an overall performance score for each climatemodel. This procedure used daily data, region-by-region for precipitation, minimumtemperature and maximum temperature. Perkins et al. (2007) ranked the AR4models using the PDF-weighted skill score demonstrating considerable variationamong the AR4 models (see Table 1) over regional Australia with MIROC-M,CSIRO and MRI overall performing consistently well relative to other models.

Fundamental to this paper is our unprovable assertion that a model that isable to simulate the PDF of a variable well for the 20th Century is more likelyable to simulate a future PDF. Consider a model that has a high level of skillin simulating the current PDF of daily maximum temperature. This model mustbe able to simulate the drivers and associated feedbacks for the current climatewell. The model must capture, at a daily timescale, the interactions between thesurface, boundary layer, clouds and radiation well, else the PDF would be biasedtowards high values (too little soil moisture, too little evaporation, too little cloudetc) or low values (too much surface moisture, high evaporation and associated cloudleading to too little radiation). Given this complexity, it is hard to imagine a modelcapturing the observed PDF of maximum daily temperature with a high degree ofskill fortuitously. Now, imagine the PDF for maximum temperature for 2050. There

Climatic Change (2009) 93:527–558 529

Table 1 All climate models with daily data for TMAX, TMIN and P available from PCMDI

Acronym Model Source

BCCR bccr_bccm2_0 Bjerknes Centre for Climate Research (BCCR),University of Bergen, Norway

CGCM-h cccma_cgcm3_1_t63 Canadian Centre for Climate Modeling and AnalysisCGCM-l cccma_cgcm3_1_t47 Canadian Centre for Climate Modeling and AnalysisCSIRO csiro_mk3_0 Australian Commonwealth Scientific and

Research OrganizationGFDL2.0 gfdl_cm2_0 Geophysical Fluid Dynamics LaboratoryGFDL2.1 gfdl_cm2_1 Geophysical Fluid Dynamics LaboratoryGISSAOM giss_aom Goddard Institute of Space Studies (NASA)GISS ER giss_model_e_r Goddard Institute of Space Studies (NASA)FGOALS iap_fgoals1_o_g Institute of Atmospheric Physics, Chinese Academy

of SciencesIPSL ipsl_cm4 Insitut Pierre Simon LaplaceMIROC-h miroc3_2_hires Centre for Climate System Research, University of Tokyo;

National Institute for Environmental Studies;Frontier Research Centre for Global Change

MIROC-m miroc3_2_medres Centre for Climate System Research, University of Tokyo;National Institute for Environmental Studies;Frontier Research Centre for Global Change

ECHO-G miub_echo_g Max Planck Institut für MeteorologieECHAM mpi_echam5 Max Planck Institut für MeteorologieMRI mri_cgcm2_3_2a Japan Meteorological AgencyCCSM ncar_ccsm3 National Centre for Atmospheric Research

Column 1 is the acronym used in the text. Column 2 is the name of the model used in the PCMDIarchive and column 3 is the source of the model (see http://www-pcmdi.llnl.gov/ipvv/about_ipcc.php)

will be a considerable overlap between this future PDF and the current PDF. Withinthis region where the two PDFs overlap is a region of physical and biophysicalclimate-space where the model has already demonstrated that it can capture theprocesses and feedbacks. The demonstration that a model has skill in this overlapregion gives us some confidence that it can capture these processes and feedbacks inthe future. As the change in the PDF increases such that the overlap is reduced, ourconfidence may decline, but Earth would be uninhabitable well before this overlapbecomes negligible. It is, of course, possible that non-linearities and abrupt changein how the climate system operates could lead to major reorganization of the ocean–atmosphere–biosphere system (Pitman and Stouffer 2006) making skill in the presenta poor guide to skill in the future. However, while it is reasonable to argue that amodel able to simulate the present may be unable to simulate the future, we see noevidence in the literature to demonstrate that a model unable to simulate the presentmight be superior to one that can in simulating the future.

While climate models were developed to simulate large spatial scales on longer(monthly, seasonal, annual) time scales, the impact of global warming is likely to berealized at finer spatial and temporal scales. Climate on timescales of days has a directimpact on human health (Trigo et al. 2005) and human activities (e.g. agriculture,Luo et al. 2005) and changes in parts of a modeled distribution other than the mean(e.g. the tails) are likely to affect humans, natural ecosystems, agricultural cropsetc., more than changes in the mean (Katz and Brown 1992; Colombo et al. 1999;

530 Climatic Change (2009) 93:527–558

Easterling et al. 2000). There are mixed views as to the relation between projectedchanges in mean and the change in extremes. Mearns et al. (1984, 1990), Katzand Brown (1992), Hennessy and Pittock (1995), Colombo et al. (1999) and Meehlet al. (2000) suggest that extremes may change more than indicated by a change in themean. Some studies have looked at a sequence of extreme events, rather than a singlethreshold. For example, Hennessy and Pittock (1995) noted that if mean temperatureincreased by 3◦, the probability of five consecutive days above 35◦ increased five-fold. In contrast, Kharin and Zwiers (2005) found that warm extremes change at asimilar rate as mean temperature while cool extremes change at a faster rate in awarming world. Griffiths et al. (2005) showed that over the Asia-Pacific region thechange in mean temperature over a set of observed stations was a good predictor ofthe change for both minimum and maximum temperature. In all studies, PDFs shifttowards the right, i.e. the probability of warmer events increased and cooler eventsdecreased. Disagreement stems from whether the shape of the PDF changes. Thereis also disagreement about the effect of a change in mean precipitation on extremeprecipitation. Yonetani and Gordon (2001) conclude that increases (decreases) inmean precipitation occur in the same regions where there are extremes of large(small) annual precipitation. However Kharin and Zwiers (2005) conclude thatchanges in extreme precipitation are substantially larger than the mean, and increaseby a factor of two by the end of the 21st Century.

In the AR4 assessment of the impact of warming on the Australian climate,Christensen et al. (2007) state:

All of Australia . . . [is] very likely to warm during this century . . . comparableoverall to the global mean warming. The warming is smaller in the south,especially in winter . . . Increased frequency of extreme high daily temperatures[will occur] in Australia . . . and [a] decrease in the frequency of cold extremesis very likely; and

Precipitation is likely to decrease in Southern Australia in winter and spring.Precipitation is very likely to decrease in Southwestern Australia in winter . . .Changes in rainfall in Northern and Central Australia are uncertain. Extremesof daily precipitation will very likely increase. The effect may be offset orreversed in areas of significant decrease in mean rainfall (southern Australianin winter and spring).

Christensen et al. (2007) noted that no systematic evaluation for Australia ofthe AR4 data set existed at the time the assessment was prepared. They thereforeprovided an understandably brief assessment of the overall capacity of the AR4models focussing on the ensemble mean performance. Given Perkins et al. (2007)demonstrated that climate models within the AR4 ensemble had varying skill thequestion arises whether some models are better than others for assessing the effectsof warming on the Australian climate. While the AR4 models all simulate warming,do models with more skill in simulating recent observations project more warmingthan the average of all models or less? Do the better models simulate more or lessof a change in rainfall than the ensemble of all models? Are rarer events such asthe annual rainfall event or the annual temperature maximum substantially differentamong those models that can simulate the 20th Century well? Is the projecteddecrease in rainfall over Western Australia of 10–20% in the annual total robustor biased by weak models?

Climatic Change (2009) 93:527–558 531

Fig. 1 Location map ofAustralia showing stateboundaries, major citiesand the regions discussedin the text

Darwin

Region 3

Region 2Brisbane

Region 11

Perth

Queensland

Northern

NSW

VIC

South

WesternAustralia

Australia

Adelaide

Tasmania

Sydney

Canberra

Melbourne

Hobart

Territories

In this paper, we explore the impact of increasing greenhouse gases on the projec-tion of changes in precipitation (P), minimum temperature (TMIN) and maximumtemperature (TMAX) on regional Australia. While our focus is on Australia, themethodology is transferable elsewhere. We explore whether selecting AR4 modelsbased on their skill in simulating the observed PDF modifies the projected warmingand change in precipitation over Australia. We achieve this by demonstrating howusing the skill score determined by Perkins et al. (2007) changes the projected rangefor both means and annual return periods of the overall model ensembles. The choiceof these three variables was based on available data and their impact on human andbiological systems (Colombo et al. 1999; Meehl et al. 2000; Trigo et al. 2005). Weprovide a methodology in Section 2, results in Sections 3–5, discussion in Section 6and conclusions in Section 7.

2 Data and methods

2.1 Data

Daily climate model data over Australia for P, TMIN and TMAX were taken from thePCMDI archive (http://www-pcmdi.llnl.gov/ipcc/about_ipcc.php). Data from 1981–2000 from the Climate of the Twentieth Century simulations were used as the control(see Perkins et al. 2007). In this paper we also use results from the B1 (relativelylow emissions) and A2 (relatively high emissions) scenarios for two time periods:2046–2065 (here after 2050) and 2081–2100 (here after 2100). These time periodswere chosen as they were common among all models. By using daily data we retainthe maximum time resolution possible and minimize the hiding of biases throughaveraging. Some data sets contained erroneous data (gaps, periods of repetitivedata), or data were not available at the time this study was undertaken, and were

532 Climatic Change (2009) 93:527–558

therefore omitted from subsequent analysis. Table 1 lists all models used. We usedeach independent realization directly in the initial analysis rather than averagingthese realizations to produce an ensemble result. However, we present ensemblesover the available realizations for each climate model since the differences betweenrealizations from each climate model was negligible. Model specific masks were fittedto exclude ocean data. Daily observed P, TMIN and TMAX were obtained from theAustralian Bureau of Meteorology for the period 1981–2000. The use of observeddata is fully discussed in Perkins et al. (2007).

2.2 Skill-score

Perkins et al. (2007) present a simple skill score to rank the AR4 climate modelsover regional Australia using the overlap between two PDFs. This measurementof the common area between two PDFs equals 1.0 for a perfect model (where themodeled and observed PDFs overlap perfectly) and 0.0 where the two PDFs areindependent. Perkins et al. (2007) demonstrate that this measure is robust againstuncertainty in the temporal and spatial coverage of the observations. Perkins et al.(2007) use the skill score to compare the observed PDF for the variable in questionand the 20th Century modeled PDF for a suite of ∼10◦ × 10◦ regions. Bin sizes were0.5◦C for TMAX and TMIN and 1 mm d−1 for P. All daily values of P below 0.2 mm d−1

were omitted because rates below this amount are not recorded in the observations(Parkinson 1986).

In this paper we use the PDF-based skill score as the basis for omitting modelsfrom an assessment of the impact of increasing greenhouse gases over Australia. Weomit models based on two thresholds: 0.7 and 0.8. The choice of these was subjective,balancing the desire to only include those climate models with demonstrated skill,while recognizing that the sample size of models needs to be kept reasonable. Hadwe chosen a skill score of 0.6 virtually no models would be excluded while 0.9 wouldmean virtually no models were included. We do not weight models using the skill-score. Principally, this is because we do not want to bias results via the use of a model,even with a low weighting, that has been shown to be less skilled than other models.Weighting models to maintain a larger sample size has merit (Murphy et al. 2004;Tebaldi et al. 2005). However, Stainforth et al. (2007) argue that relative weightscannot be usefully assigned to different models and suggest a weight of zero formodels that are substantially worse than the state-of-the-art. We utilize this method,but using individual variables rather than a wide range of observed variables.

�Fig. 2 Change in the TMAX (◦C) over Australia simulated for the B1 emission scenarios fora AR4 models with skill scores over 0.8 in 2050, b difference of the >0.8 models minus the all-modelensemble in 2050; c AR4 models with skill scores over 0.8 in 2100, d difference of the >0.8 modelsminus the all-model ensemble in 2100, e as (a) but for the A2 emission scenario, f as (b) but for theA2 emission scenario, g as (c) but for the A2 emission scenario and h as (d) but for the A2 emissionscenario

Climatic Change (2009) 93:527–558 533

a

b

e

f

c

d

g

h

534 Climatic Change (2009) 93:527–558

2.3 Calculation of continent-ensemble variables

To demonstrate how the skill score influenced the members of a multi-modelensemble, the mean of each variable

(TMAX, TMIN, P

)and the annual return period

(the 99.7th percentile for TMAX (TMAX99), and P (P99); the 0.3rd percentile for

TMIN (TMIN0.3) were calculated. To create a multi-model ensemble, the differenced

value for each model ensemble was linearly interpolated to a common 2◦ × 2◦grid. The average of all models for the differenced variable over each time periodand scenario was then calculated. This “skill-based” average was then comparedwith the “all-member” ensemble. Skill masks for >0.7 and >0.8 for each individualmodel were created for three regions used by Perkins et al. (2007): Region 2 (26.5◦ –35.25◦S, 143.75◦E–154◦E); Region 3 (17.75◦S–26.5◦S, 143.75◦E–154◦E); and Region11 (26.5◦ –35.25◦S, 113◦ –123.25◦E) (Fig. 1).

3 Simulation of mean changes over Australia

3.1 Mean maximum temperature— TMAX

Figure 2 shows the simulation of TMAX by those AR4 climate models with a PDF-weighted skill score >0.8 for the B1 emission scenario for 2050 (Fig. 2a) and for2100 (Fig. 2c). Warming over most of Australia is 1.0–2.0◦C (2050) and 2.0–3.0◦C(2100). Less warming is projected along the south coast (0.5–1.5◦C) in agreementwith Christensen et al. (2007). Figure 2b and d shows the difference between thisPDF-weighted skill score and an all-model ensemble mean. The skill-based ensemblesimulates less warming than the all-model ensemble over Victoria by 0.3–1.0◦C inboth 2050 and 2100 and simulates more warming (0.1 to 0.3◦C, 2050 and 0.1 to 1.0◦C,2100) than the all-model ensemble over many other regions.

Figure 2e–h shows the result for the A2 emission scenario. This is similar tofigure 11.17 in Christensen et al. (2007) (they show the annual average surface airtemperature rather than the annual mean daily maximum air temperature). Theincrease in TMAX is about 1–2◦C higher than the warming in the daily mean, onthe annual average. Warming in the skill-based ensemble is 1.0–2.0◦C over theeastern half of the continent and 2.0–2.5◦C over the western half (Fig. 2e, 2050).This increases to 2.5–4.0◦C over eastern regions in 2100 and 3.5–5.0◦C over muchof Western Australia (Fig. 2g). If models with skill scores <0.8 are included lesswarming is projected over parts of eastern and south-eastern Australia but morewarming occurs over the tropics and Western Australia (2050, Fig. 2f). In 2100,omitting weaker models leads to more warming over the south and south-easterncoasts (mainly 0.1–0.3◦C) and less warming over the north-east sub-tropics (0.1–1.0◦C, Fig. 2h).

�Fig. 3 As Fig. 2 but for TMIN (◦C)

Climatic Change (2009) 93:527–558 535

b

a c

d

g

h

e

f

536 Climatic Change (2009) 93:527–558

3.2 Mean minimum temperature(

TMIN

)

Figure 3 shows the change in TMIN simulated by the more skilful models for theB1 and A2 scenarios. Warming in TMIN is generally 1.0–2.0◦C (2050, Fig. 3a),increasing to 1.5–2.5◦C (2100, Fig. 3c). The southern coast of Australia experiencesless warming in both 2050 and 2100 in agreement with Christensen et al. (2007).Omitting those models with skill scores <0.8 affects the pattern of warming in TMIN

in 2050 (Fig. 3b) and 2100 (Fig. 3d). Large regions warm by 0.1–0.3◦C more in theskill-based ensemble, although there are hints of less warming (0.1–0.3◦C) in parts ofthe tropics.

Figure 3e–h shows the results for the A2 scenario. The much larger increases inwarming compared to the B1 scenario is clear by 2050 and dramatic by 2100. In2050, Fig. 3e shows least warming along the south coast (1.0–1.5◦C) and most in thesub-tropics (2.0–2.5◦C). By 2100, small areas warm by more than 4◦C in the skill-based ensemble (Fig. 3g). This level of warming is masked by the weaker modelsand if these are included warming is reduced by 0.3–1.0◦C (Fig. 3h). Indeed, Fig. 3hshows that most of the continent warms by more than 0.1–0.3◦C and in some areas by0.3–1.0◦C if only those models with skill >0.8 are included. This additional warming,in the ensemble average, is obtained simply by omitting models with the weakest skillscores over the 20th Century. The weaker models bias the ensemble average changein TMIN, leading to an underestimate in the projected warming relative to the moreskilful models.

3.3 Mean precipitation(

P)

Figure 4 shows the impact of selecting climate models based on skill scores on theprojected change in P over Australia. Under the B1 emission scenario NorthernAustralia experiences a small increase in P by 2050 in the skill-based modelensemble (0.3–1.0 mm d−1, Fig. 4a), an increase which is weakly sustained throughto 2100 (Fig. 4c). Most regions, where the population is most dense, show littlechange in P (Fig. 4a). The patterns of change in P in 2050 and 2100 are biased overlarge areas by models that cannot simulate the 20th century precipitation PDF well(Fig. 4b). Including only models with skill scores >0.8 leads to additional increases inP simulated for 2050 and 2100 over the tropics, sub-tropics and eastern states (Fig. 4b,d). Thus, rainfall projections, averaged over the best models, shows an increase in Pover many regions of Australia.

The result for 2050 A2 (Fig. 4e) shows a similar result to 2100 B1 (Fig. 4d)noting that the areas of increased P is smaller in 2050 B1 (Fig. 4a). A very strongdrying around coastal Australia is clear in Fig. 4g. This coastal drying in the A2emission scenario does not change if weak models are included (Fig. 4h). However,the increases in under the A2 emission scenario over the tropics, and extendingsouthwards in 2050 (Fig. 4f) is hidden by weaker models. Thus, over Queenslandand the Northern Territories, omission of the weak models enhances the increasedP, seen in the skill-based model ensemble.

�Fig. 4 As Fig. 2 but for P (mm d−1)

Climatic Change (2009) 93:527–558 537

a

b

c

d

e

f

g

h

538 Climatic Change (2009) 93:527–558

4 Simulation of changes in the annual event over Australia

We explore the future behavior of extremes compared to the mean by analyzing thechange in TMAX

99, TMIN0.3 andP99 from the AR4 models (approximately the annual

event). We explore whether the changes in the annual return is larger than the changein the mean, and also whether this change differs between the all-model ensembleand those models with strong 20th Century skill scores. We also analyzed the 95thand 5th percentiles and found very similar results.

4.1 Maximum temperature(TMAX

99)

Figure 5 shows the change in TMAX99 for the B1 emission scenario (this can be

compared with Fig. 2 for TMAX). TMAX99 warms by 2.0–2.5◦C in most of the southern

half of Australia and by 1.5–2.0◦C over most of the northern half. In contrast toFig. 2 where the least warming occurred in the southern parts of the continent, mostwarming in TMAX

99 occurs along the south coast exceeding 3.0◦C in most areas andlocally exceeding 5.0◦C. This warming is substantially reduced if weak models areincluded (Fig. 5b). A similar result is clear in 2100 (Figs. 5c and d) except the amountof warming in TMAX

99 exceeds 4–5◦C over larger areas. Under the A2 scenario in2050 (Fig. 5e, g) warming is 0.5◦C less than the B1 scenario except over Victoriawhere TMAX

99 increases by 4–5◦C. Over areas of south eastern Australia, adding inthe weaker models masks warming, but over a large region of central and westernregions, the skill-based ensemble projects considerably less warming than the all-model ensemble. By 2100, Fig. 5g suggests that almost the entire continent warms by>3◦C in TMAX

99, with warming exceeding 5◦C in the southern states. As with Fig. 5f,the better models simulates more warming over south eastern regions and less overcentral and western regions than the all-model ensemble.

Most confronting in Fig. 5, for each projection, is that the most warming in TMAX99

occurs over south-west Western Australia and the coast of South Australia andVictoria. While, as shown by Christensen et al. (2007), these regions warm least inthe mean, they warm most (by 5.0–8.0◦C) in TMAX

99. This warming is masked to animportant degree by weak models.

Figure 6 shows the observed, the all-model range, the range for those models withskill-scores >0.7 and those with a skill-score >0.8 for TMAX and TMAX

99 for the threeregions. The ensemble means are not shown because results within the range shownin Fig. 6 for models with skill scores >0.8 are equally reliable. First, in the case ofthe results for the 20th Century climate, the range of the ensemble for both TMAX

and TMAX99 are close to the observed in each region. Where the 20th Century multi-

model ensemble is quite wide (Region 2 and 3) excluding models on the basis oftheir PDF-skill score reduces the range of the ensemble to more closely resemblethe observed. Climate model selection is based on the whole PDF and Fig. 6 showsan improved simulation of both the regional TMAX and TMAX

99 which was to beexpected. A noteworthy feature is how impressive the models included in the skill-based ed ensemble are in simulating TMAX and TMAX

99 at regional scales.

�Fig. 5 As Fig. 2 but for TMAX99 (◦C)

Climatic Change (2009) 93:527–558 539

a c

d

g

h

b

e

f

540 Climatic Change (2009) 93:527–558

One feature common to all regions (Fig. 6) is that as weaker models are omittedthat range represented by the thin bars reduces. This is not merely due to reducingsample size as a random selection of models to match the number in each ensembleaverage does not reproduce this result. In most examples, the reduction in the rangeis due to omission of models at the upper end of the projections; models simulatingthe greatest regional warming are almost always amongst the weaker models. Insome regions, omitting weak models leads to a very narrow range in projectedwarming (Fig. 6, Region 3) while elsewhere it makes little difference (Region 11).

mean TMAX: Region 2

20

25

30

35

40

Obs

erve

d

20C

(10)

>0.

7 (9

)>

0.8

(7)

A2

2050

(5)

>0.

7 (3

)>

0.8

(3)

A2

2100

(6)

>0.

7 (4

)>

0.8

(4)

B1

2050

(10

)>

0.7

(9)

>0.

8 (7

)

B1

2100

(9)

>0.

7 (8

)>

0.8

(6)

Obs

erve

d

20C

(10)

>0.

7 (9

)>

0.8

(7)

A2

2050

(5)

>0.

7 (3

)>

0.8

(3)

A2

2100

(6)

>0.

7 (4

)>

0.8

(4)

B1

2050

(10

)>

0.7

(9)

>0.

8 (7

)

B1

2100

(9)

>0.

7 (8

)>

0.8

(6)te

mp

erat

ure

(cel

siu

s)

mean TMAX: Region 3

20

25

30

35

40

Obs

erve

d

20C

(10

)>

0.7(

7)>

0.8

(5)

A2

2050

(5)

>0.

7 (3

)>

0.8

(3)

A2

2100

(6)

>0.

7 (4

)>

0.8

(3)

B1

2050

(10

)>

0.7

(7)

>0.

8 (5

)

B1

2100

(9)

>0.

7 (6

)>

0.8

(5)

Obs

erve

d

20C

(10

)>

0.7(

7)>

0.8

(5)

A2

2050

(5)

>0.

7 (3

)>

0.8

(3)

A2

2100

(6)

>0.

7 (4

)>

0.8

(3)

B1

2050

(10

)>

0.7

(7)

>0.

8 (5

)

B1

2100

(9)

>0.

7 (6

)>

0.8

(5)te

mp

erat

ure

(cel

siu

s)

mean TMAX: Region 11

20

25

30

35

40

Obs

erve

d

20C

(10)

>0.

7 (9

)>

0.8

(8)

A2

2050

(5)

>0.

7 (4

)>

0.8

(4)

A2

2100

(6)

>0.

7 (5

)>

0.8

(5)

B1

2050

(10

)>

0.7

(9)

>0.

8 (8

)

B1

2100

(9)

>0.

7 (8

)>

0.8

(7)

Obs

erve

d

20C

(10)

>0.

7 (9

)>

0.8

(8)

A2

2050

(5)

>0.

7 (4

)>

0.8

(4)

A2

2100

(6)

>0.

7 (5

)>

0.8

(5)

B1

2050

(10

)>

0.7

(9)

>0.

8 (8

)

B1

2100

(9)

>0.

7 (8

)>

0.8

(7)te

mp

erat

ure

(cel

siu

s)

TMAX99: Region 2

20

30

40

50

60

tem

per

atu

re (

cels

ius)

TMAX99: Region 3

20

30

40

50

60

tem

per

atu

re (c

elsi

us)

TMAX99: Region 11

20

30

40

50

60

tem

per

atu

re (c

elsi

us)

Fig. 6 Actual observed and simulated TMAX (left) and TMAX99 (right) (◦C) for three regions of

Australia. The observed value is shown in the first column. The 20th century data for all models,those with skill scores >0.7 and those >0.8 are shown next. In each case, the ensemble mean isshown, with the range of model simulations within the ensemble identified by the error bars. Resultsfor A2 (2050), A2 (2100), B1 (2050) and B2 (2100) are also shown. The number of models in eachsample is shown on the x-axis label

Climatic Change (2009) 93:527–558 541

Figure 7 shows the change in the frequency of TMAX99 (i.e., the frequency of

this event in the current climate is once per year). The actual size of this event foreach region is shown in Table 2. The two most important conclusions from Fig. 7 isthat selecting models based on skill does not reduce the range of projections in thefrequency of TMAX

99 and that this range is extremely large. The actual projections

Fig. 7 As Fig. 6 but for thefrequency (in days per year) ofthe current annual event by2050 and 2100 under eachemission scenario. Thefrequency of the currentobserved is by definition 1.0

Frequency of T MAX99 : Region 2

0

10

20

30

40

50

60

70

A2

2050

(5)

>0.

7 (3

)

>0.

8 (3

)

A2

2100

(6)

>0.

7 (4

)

>0.

8 (4

)

B1

2050

(10

)

>0.

7 (9

)

>0.

8 (7

)

B1

2100

(9)

>0.

7 (8

)

>0.

8 (6

)

freq

uen

cy (

day

s/ye

ar)

Frequency of T MAX99 : Region 3

0

10

20

30

40

50

60

70

A2

2050

(5)

>0.

7 (3

)

>0.

8 (3

)

A2

2100

(6)

>0.

7 (4

)

>0.

8 (3

)

B1

2050

(10

)

>0.

7 (7

)

>0.

8 (5

)

B1

2100

(9)

>0.

7 (6

)

>0.

8 (5

)

freq

uen

cy (

day

s/ye

ar)

Frequency of T MAX99 : Region 11

0

10

20

30

40

50

60

70

A2

2050

(5)

>0.

7 (4

)

>0.

8 (4

)

A2

2100

(6)

>0.

7 (5

)

>0.

8 (5

)

B1

2050

(10

)

>0.

7 (9

)

>0.

8 (8

)

B1

2100

(9)

>0.

7 (8

)

>0.

8 (7

)

freq

uen

cy (

day

s/ye

ar)

542 Climatic Change (2009) 93:527–558

Table 2 Observed and simulated TMAX (◦C) and P (mm d−1)

Region TMAX P

Observed 20th century (>0.8) Observed 20th century (>0.8)

Region 2 36.0 37.63 29.08 20.50Region 3 36.68 36.29 31.09 21.21Region 11 39.55 38.93 18.91 16.82

Only those models with a skill score exceeding 0.8 are included in the model ensemble average

vary dramatically. Figure 7 is not included to provide guidance on the value for thefrequency of the annual return value that might be used for impacts modeling, ratherit is included to illustrate that the uncertainty remains too high for this statistic tobe generally usefully used. An exception to this is the increase in the frequencyby 2100 under the A2 emission scenario. The minimum increase is three times peryear (Region 2), six times per year (Region 3) and 18 times per year (Region 11).Clearly, experiencing the current mean annual maximum temperature 18 times peryear would affect many biophysical and social systems.

4.2 Minimum temperature(TMIN

0.3)

Figure 8 shows the result for(TMIN

0.3). Some warming is apparent across the whole of

the continent under the B1 emission scenario by 2050 but the warming is mainly lessthan 1.5◦C. This increases to 1.0–1.5◦C by 2100. The impact of including the weakermodels on this projected warming (Fig. 8b, d) is generally <0.3◦C.

Warming of(TMIN

0.3)

under the A2 emission scenario shows most warming in thesub-tropics and least warming along the southern coast in both 2050 (Fig. 8e) and2100 (Fig. 8g). Amounts increase from 1.0–1.5◦C (2050, south coast) to 2.0–2.5◦C(sub-tropics) in 2050. By 2100 warming is about 2.0–2.5◦C (south coast) increasing to3.5–4.0◦C (sub-tropics). The warming projected by the skill-based ensemble (Fig. 8e,g) is systematically higher than the all-model ensemble. The differences betweenthese ensembles (Fig. 8f, h) show additional warming of 0.1–1.0◦C (2050) and mainly0.3–1.0◦C (2100). Localized areas of the tropics and eastern Victoria undergo lesswarming in the all-model ensemble.

The impact of warming on the mean TMIN and TMIN0.3 based on excluding individ-

ual models based on skill is shown in Fig. 9. As with Fig. 7, as weaker models are omit-ted the remaining ensemble range in TMIN decreases, largely as a result of omission ofmodels that simulate larger amounts of warming. No comparable decrease in TMIN

0.3

occurs. We do not show the impact of warming on the annual frequency of TMIN.By 2050, even under the B1 emission scenario, the 20th Century annual TMIN nolonger occurs.

�Fig. 8 As Fig. 2 but for TMIN0.3 (◦C)

Climatic Change (2009) 93:527–558 543

a

b

e

f

c

d

g

h

544 Climatic Change (2009) 93:527–558

mean T : Region 2

10

15

20

25O

bser

ved

20C

(13)

>0.

7(13

)>

0.8

(11)

A2

2050

(9)

>0.

7 (9

)>

0.8

(7)

A2

2100

(9)

>0.

7 (9

)>

0.8

(7)

B1

2050

(12

)>

0.7

(12)

>0.

8 (1

0)

B1

2100

(13

)>

0.7

(13)

>0.

8 (1

1)

tem

per

atu

re (

cels

ius)

mean T : Region 3

10

15

20

25

Obs

erve

d

20C

(13)

>0.

7(12

)>

0.8

(9)

A2

2050

(9)

>0.

7 (8

)>

0.8

(6)

A2

2100

(9)

>0.

7 (8

)>

0.8

(6)

B1

2050

(12)

>0.

7 (1

1)>

0.8

(9)

B1

2100

(13)

>0.

7 (1

2)>

0.8

(9)

tem

per

atu

re (

cels

ius)

mean TMIN

: Region 11

10

15

20

25

Obs

erve

d

20C

(13

)>

0.7

(11)

>0.

8 (1

0)

A2

2050

(9)

>0.

7 (7

)>

0.8

(7)

A2

2100

(9)

>0.

7 (7

)>

0.8

(7)

B1

2050

(12

)>

0.7

(10)

>0.

8 (9

)

B1

2100

(13

)>

0.7

(11)

>0.

8 (1

0)

tem

per

atu

re (

cels

ius)

T 99 : Region 2

-5

0

5

10

15

Obs

erve

d

20C

(13

)>

0.7(

13)

>0.

8 (1

1)

A2

2050

(9)

>0.

7 (9

)>

0.8

(7)

A2

2100

(9)

>0.

7 (9

)>

0.8

(7)

B1

2050

(12

)>

0.7

(12)

>0.

8 (1

0)

B1

2100

(13

)>

0.7

(13)

>0.

8 (1

1)

tem

per

atu

re (

cels

ius)

T 99 : Region 3

-5

0

5

10

15

Obs

erve

d

20C

(13

)>

0.7(

12)

>0.

8 (9

)

A2

2050

(9)

>0.

7 (8

)>

0.8

(6)

A2

2100

(9)

>0.

7 (8

)>

0.8

(6)

B1

2050

(12

)>

0.7

(11)

>0.

8 (9

)

B1

2100

(13

)>

0.7

(12)

>0.

8 (9

)

tem

per

atu

re (

cels

ius)

T MIN

MINMIN

MINMIN

99 : Region 11

-5

0

5

10

15

Obs

erve

d

20C

(13

)>

0.7

(11)

>0.

8 (1

0)

A2

2050

(9)

>0.

7 (7

)>

0.8

(6)

A2

2100

(9)

>0.

7 (7

)>

0.8

(6)

B1

2050

(12

)>

0.7

(10)

>0.

8 (9

)

B1

2100

(13

)>

0.7

(11)

>0.

8 (1

0)

tem

per

atu

re (

cels

ius)

Fig. 9 As Fig. 6 but TMIN and TMIN0.3 (◦C)

4.3 Precipitation (P99)

Figure 10 shows the results for P99 and shows that this statistic is clearly moresensitive to warming than P (Fig. 4). While changes in P of 0.1–0.5 mm d−1 occurred(0.5–1.0 mm d−1 in the tropics by 2050), P99 increases exceed 1 mm d−1 over virtuallythe whole continent and exceed 20 mm d−1 in the tropics. Including the weakermodels decreases the change in P99 by 1–5 mm d−1 (Fig. 10b) over large areas andby 10–20 mm d−1 over the tropics by 2100 (Fig. 10d). The result is similar in theA2 emission scenario (Fig. 10e, g). Including the weaker models again decreases the

�Fig. 10 As Fig. 2 but for P99 (mm d−1)

Climatic Change (2009) 93:527–558 545

a

b d

e

f

c

g

h

546 Climatic Change (2009) 93:527–558

change in P99, over large areas in 2050 but restricted to the tropics and eastern NSWin 2100 (Fig. 10h).

Figure 11 shows the impact of excluding models based on their skill score for PandP99. A very large range in projected rainfall remains even amongst those modelswith skill-scores >0.8. No systematic picture emerges of a strong change in rainfall,even under an A2 emission scenario at 2100. A similarly large range remains for P99

and there is little clear evidence of a large increase in this statistic relative to theremaining uncertainty in the models. However, for all three regions, the reduction inthe ensemble range from the all-model result to the result where only models withskill >0.7 are included shows that some models with very low rainfall are excluded.Those models that simulate low rainfall also simulate a low rainfall sensitivity; this

mean P: Region 2

0.00.51.01.52.02.53.03.54.04.5

Obs

erve

d

20C

(13

)>0

.7 (8

)>0

.8 (3

)

A2

2050

(11

)>0

.7 (7

)>0

.8 (3

)

A2

2100

(12

)>0

.7 (8

)>0

.8 (3

)

B1

2050

(11

)>0

.7 (6

)>0

.8 (3

)

B1

2100

(12

)>0

.7 (7

)>0

.8 (3

)

pre

cip

itat

ion

(m

m/d

ay)

mean P: Region 3

0.00.51.01.52.02.53.03.54.04.5

Obs

erve

d

20C

(13)

>0.7

(4)

>0.8

(2)

A2

2050

(11

)>0

.7 (4

)>0

.8 (2

)

A2

2100

(12

)>0

.7 (4

)>0

.8 (2

)

B1

2050

(11

)>0

.7 (4

)>0

.8 (2

)

B1

2100

(12

)>0

.7 (4

)>0

.8 (2

)

pre

cip

itat

ion

(m

m/d

ay)

mean P: Region 11

0.00.51.01.52.02.53.03.54.04.5

Obs

erve

d

20C

(13

)>0

.7 (7

)>0

.8 (3

)

A2

2050

(11

)>0

.7 (7

)>0

.8 (3

)

A2

2100

(12

)>0

.7 (7

)>0

.8 (3

)

B1

2050

(11

)>0

.7 (6

)>0

.8 (3

)

B1

2100

(12

)>0

.7 (6

)>0

.8 (3

)

pre

cip

itat

ion

(mm

/day

)

P99: Region 2

0

10

20

30

40

50

60O

bser

ved

20C

(13)

>0.7

(8)

>0.8

(3)

A2

2050

(11

)>0

.7 (7

)>0

.8 (3

)

A2

2100

(12

)>0

.7 (8

)>0

.8 (3

)

B1

2050

(11

)>0

.7 (6

)>0

.8 (3

)

B1

2100

(12

)>0

.7 (7

)>0

.8 (3

)

pre

cip

itat

ion

(m

m/d

ay)

P99: Region 3

0

10

20

30

40

50

60

Obs

erve

d

20C

(13)

>0.7

(4)

>0.8

(2)

A2

2050

(11

)>0

.7 (4

)>0

.8 (2

)

A2

2100

(12

)>0

.7 (4

)>0

.8 (2

)

B1

2050

(11

)>0

.7 (4

)>0

.8 (2

)

B1

2100

(12

)>0

.7 (4

)>0

.8 (2

)

pre

cip

itat

ion

(m

m/d

ay)

P99: Region 11

0

10

20

30

40

50

60

Obs

erve

d

20C

(13

)>0

.7 (7

)>0

.8 (3

)

A2

2050

(11

)>0

.7 (7

)>0

.8 (3

)

A2

2100

(12

)>0

.7 (7

)>0

.8 (3

)

B1

2050

(11

)>0

.7 (6

)>0

.8 (3

)

B1

2100

(12

)>0

.7 (6

)>0

.8 (3

)

pre

cip

itat

ion

(m

m/d

ay)

Fig. 11 As Fig. 6 but for P and P99 (mm d−1)

Climatic Change (2009) 93:527–558 547

biases the results from the better models and tends to over-estimate the drying thatis simulated over parts of Australia, or masks areas of increase in P andP99.

In terms of the frequency of the P99 event Fig. 12 shows that some models simulatea very large increase in this frequency (14 times per year, Region 2 for example).However, this is, without exception, a result generated by the weaker models. If

Fig. 12 As Fig. 7 but forprecipitation (days per year)

Frequency of P 99: Region 11

0

2

4

6

8

10

12

14

16

A2

2050

(12

)

>0.

7 (8

)

>0.

8 (3

)

A2

2100

(12

)

>0.

7 (8

)

>0.

8 (3

)

B1

2050

(14

)

>0.

7 (8

)

>0.

8 (3

)

B1

2100

(14

)

>0.

7 (8

)

>0.

8 (3

)

freq

uen

cy (

day

s/ye

ar)

Frequency of P 99: Region 3

0

2

4

6

8

10

12

14

16

A2

2050

(12

)

>0.

7 (4

)

>0.

8 (2

)

A2

2100

(12

)

>0.

7 (4

)

>0.

8 (2

)

B1

2050

(14

)

>0.

7 (4

)

>0.

8 (2

)

B1

2100

(14

)

>0.

7 (4

)

>0.

8 (2

)

freq

uen

cy (

day

s/ye

ar)

Frequency of P 99: Region 2

0

2

4

6

8

10

12

14

16

A2

2050

(12

)

>0.

7 (8

)

>0.

8 (3

)

A2

2100

(12

)

>0.

7 (8

)

>0.

8 (3

)

B1

2050

(14

)

>0.

7 (8

)

>0.

8 (3

)

B1

2100

(14

)

>0.

7 (8

)

>0.

8 (3

)

freq

uen

cy (

day

s/ye

ar)

548 Climatic Change (2009) 93:527–558

these are excluded, in each region the change in the frequency of P99 decreasesdramatically. In Region 2, the once-per-year event increases to up to 3 times peryear except in 2100 (A2 emission scenario) where it increases to 1–9 times a year.A similar result is clear in Fig. 12 for Regions 3 and 11 where the increase in thefrequency of P99 is generally very small except under A2 (2100), although even herethe range of projections from the good models includes no change.

5 Which models bias the results?

Perkins et al. (2007) identified, model by model, those with specific levels of skillover Australia based on PDFs of TMIN, TMAX and P. These skill scores were used

mean TMAX difference: Region 2

0

1

2

3

4

5

CG

CM

-h

CS

IRO

FG

OA

LS

MIR

OC

-h

EC

HO

-G

tem

per

atu

re (

cels

ius)

TMAX99 difference : Region 2

0

2

4

6

8C

GC

M-h

CS

IRO

FG

OA

LS

MIR

OC

-h

EC

HO

-G

tem

per

atu

re (

cels

ius)

mean TMAX difference: Region 3

0

1

2

3

4

5

CG

CM

-h

CS

IRO

FG

OA

LS

MIR

OC

-h

EC

HO

-G

tem

per

atu

re (

cels

ius)

TMAX99 difference: Region 3

0

2

4

6

8

CG

CM

-h

CS

IRO

FG

OA

LS

MIR

OC

-h

EC

HO

-G

tem

per

atu

re (

cels

ius)

mean TMAX difference: Region 11

0

1

2

3

4

5

CG

CM

-h

CS

IRO

FG

OA

LS

MIR

OC

-h

EC

HO

-G

tem

per

atu

re (

cels

ius)

TMAX99 difference: Region 11

0

2

4

6

8

CG

CM

-h

CS

IRO

FG

OA

LS

MIR

OC

-h

EC

HO

-G

tem

per

atu

re (

cels

ius)

Fig. 13 Change in TMAX (left) and TMAX99 (right) (◦C) for the three regions for each model. Models

colored red are omitted for a skill score <0.7, and models colored blue are omitted for a skill score<0.8. Those models in green are included in the >0.8 ensemble. The individual bars show changesprojected for A2 (2050), A2 (2100), B1 (2050) and B1 (2100) if data were available

Climatic Change (2009) 93:527–558 549

above to omit specific models from ensemble averages in the preceding sections. Thissection shows model-by-model results for the three regions to highlight the level ofconsistency between models and to note those models excluded at each threshold.

For TMAX, every climate model simulates an increase and almost all models simu-late an increase in TMAX

99 (Fig. 13). The range in the increase in TMAX is generally asexpected with substantially more warming by 2100 under the A2 emission scenario inmost models. CGCM-h and CGCM-l are omitted in Regions 2 and 3 while CGCM-lis omitted in Region 11 using a threshold of 0.7. Since CGCM-l does not appear to beanomalous in the amount of projected warming this has little impact on the ensembleaverage. A 0.8 threshold has a larger impact in terms of the number of modelsomitted but there is no obvious bias in these models in terms of their projectedwarming and the change in the projected mean warming is small.

In the case of TMAX99 (Fig. 13) CGCM-h and CGCM-l (Region 2 and 3), MRI

(Region 3) and CGCM-l (Region 11) are omitted using a threshold of 0.7. Theomitted models do not appear to be systematically different from those remaining

mean TMIN

MIN

MIN MIN

difference: Region 2

0

1

2

3

4

5

BC

CR

CG

CM

-l

GF

DL2

.1

GIS

S E

R

IPS

L

EC

HO

-G

MR

Item

per

atu

re (

cels

ius)

TMIN99 difference: Region 2

-2-10123456

BC

CR

CG

CM

-l

GF

DL2

.1

GIS

S E

R

IPS

L

EC

HO

-G

MR

Item

per

atu

re (

cels

ius)

mean T difference: Region 3

0

1

2

3

4

5

BC

CR

CG

CM

-l

GF

DL2

.1

GIS

S E

R

IPS

L

EC

HO

-G

MR

Item

per

atu

re (

cels

ius)

T 99 difference: Region 3

-2-10123456

BC

CR

CG

CM

-l

GF

DL2

.1

GIS

S E

R

IPS

L

EC

HO

-G

MR

Item

per

atu

re (

cels

ius)

mean T difference: Region 11 difference: Region 11

0

1

2

3

4

5

BC

CR

CG

CM

-l

GF

DL2

.1

GIS

S E

R

IPS

L

EC

HO

-G

MR

Item

epra

ture

(ce

lsiu

s)

TMIN99

-2-10123456

BC

CR

CG

CM

-l

GF

DL2

.1

GIS

S E

R

IPS

L

EC

HO

-G

MR

Item

per

atu

re (

cels

ius)

Fig. 14 As Fig. 13 but for TMIN and TMIN0.3 (◦C)

550 Climatic Change (2009) 93:527–558

in the ensemble. Using a 0.8 threshold, several models that simulate relativelylarge increases in TMAX are omitted but most models that simulate small increasesare retained. This significantly reduces the overall range amongst the remainingmodels. Similar results are found for TMIN (Fig. 14). Note that models that simulateapparently anomalous results for TMIN (CSIRO, IPSL) are not omitted using a 0.8threshold and thus it is appropriate to retain these model estimates in any uncertaintyrange. This illustrates the danger of omitting an apparently anomalous model (e.g.Stainforth et al. 2007) without good reason.

Figure 15 shows the result for P. There is little evidence for a systematic pattern ofeither increase or decrease amongst the models. The simplest conclusion for all threeregions is that those models that simulate the largest change in P are exclusivelythose with limited skill in simulating the observed (GISSER, MIROC-m, CCSM forexample). A similar conclusion can be reached for P99 (Fig. 15). In all cases, it isthose models that simulate the largest changes that have the least skill.

Finally, Fig. 16 shows the number of days per year each model simulates as havingno rainfall (defined as less than 0.2 mm d−1). Amongst those models that have skill

P99 difference: Region 2

-15

-10

-5

0

5

10

15

BC

CR

CS

IRO

GF

DL2

.1

GIS

S E

R

IPS

L

EC

HO

-G

MR

Ipre

cip

itat

ion

(mm

/day

)

mean P difference: Region 2

-3

-2

-1

0

1

2

BC

CR

CG

CM

-l

CS

IRO

GF

DL2

.0

GF

DL2

.1

GIS

S

GIS

S E

R

FG

OA

LS

IPS

L

MIR

OC

-

EC

HO

-G

EC

HA

M

MR

I

CC

SMp

reci

pit

atio

n (m

m/d

ay)

mean P difference: Region 3

-3

-2

-1

0

1

2

BC

CR

CG

CM

-l

CS

IRO

GF

DL2

.0

GF

DL2

.1

GIS

S

GIS

S E

R

FG

OA

LS

IPS

L

MIR

OC

-

EC

HO

-G

EC

HA

M

MR

I

CC

SMp

reci

pit

atio

n (m

m/d

ay)

P99 difference: Region 3

-15

-10

-5

0

5

10

15

BC

CR

CS

IRO

GF

DL2

.1

GIS

S E

R

IPS

L

EC

HO

-G

MR

Ipre

cip

itat

ion

(mm

/day

)

mean P difference: Region 11

-3

-2

-1

0

1

2

BC

CR

CG

CM

-l

CS

IRO

GF

DL2

.0

GF

DL2

.1

GIS

S

GIS

S E

R

FG

OA

LS

IPS

L

MIR

OC

-

EC

HO

-G

EC

HA

M

MR

I

CC

SMp

reci

pit

atio

n (

mm

/day

) P99 difference: Region 11

-15

-10

-5

0

5

10

15

BC

CR

CS

IRO

GF

DL2

.1

GIS

S E

R

IPS

L

EC

HO

-G

MR

Ipre

cip

itia

tio

n (

mm

/day

)

Fig. 15 As Fig. 13 but for P and P99 (mm d−1)

Climatic Change (2009) 93:527–558 551

scores >0.7 there is a strong indication of a substantial increase in rain free days inall three regions. Overall the skilful models hint at a small mean rainfall increase(Fig. 16). However, the better models simulate less change than the weaker modelsimplying that weak models tend to overestimate changes in some rainfall statistics.

Fig. 16 As Fig. 13 butfor the change in thefrequency of no rain days(defined as <0.2 mm d−1)

yearly change in no rain days region 2

-300

-200

-100

0

100

200

300

BC

CR

CG

CM

-l

CS

IRO

GF

DL2

.0

GF

DL2

.1

GIS

S A

OM

GIS

S E

R

FG

OA

LS

IPS

L

MIR

OC

-m

EC

HO

-G

EC

HA

M

MR

I

CC

SM

no

of d

ays

<=0.

2/ye

ar

yearly change in no rain days region 3

-300

-200

-100

0

100

200

300

BC

CR

CG

CM

-l

CS

IRO

GF

DL2

.0

GF

DL2

.1

GIS

S A

OM

GIS

S E

R

FG

OA

LS

IPS

L

MIR

OC

-m

EC

HO

-G

EC

HA

M

MR

I

CC

SM

no

of d

ays

<=0.

2/ye

ar

yearly change in no rain days region 11

-300

-200

-100

0

100

200

300

BC

CR

CG

CM

-l

CS

IRO

GF

DL2

.0

GF

DL2

.1

GIS

S A

OM

GIS

S E

R

FG

OA

LS

IPS

L

MIR

OC

-m

EC

HO

-G

EC

HA

M

MR

I

CC

SM

no

of d

ays

<=0.

2/ye

ar

552 Climatic Change (2009) 93:527–558

6 Discussion

Randall et al. (2007) assessed climate models and concluded that they providereliable estimates of future climate particularly at continental scales and above.Perkins et al. (2007) assessed the regional (10◦ × 10◦) performance of the AR4climate models over Australia using a skill score derived from daily PDFs andshowed that some AR4 models demonstrated considerable skill at regional scalesfor all regions for all of precipitation, maximum and minimum temperature. Thus,at least over Australia, Randall et al.’s (2007) conclusion could be broadened toindicate that some AR4 models have credible quantitative skill in simulating currentregional climates. However, Perkins et al. (2007) showed that some AR4 modelshad relatively little skill. This paper explored whether omitting these models led tosubstantially different projections of regional warming and precipitation change.

We showed in this paper that if we omit models with skill scores <0.8 the amountof warming projected in TMAX and TMIN generally increases (0.1–0.3◦C) over the all-model ensemble average with the exception of parts of Victoria and the sub-tropics(

TMAX

)and tropics

(TMIN

)under B1. The additional warming simulated by the

better models was larger (0.3–1.0◦C) in(

TMIN

)by 2100 under A2. There is no a

priori reason to expect this—rather we anticipated that the weaker models wouldpredict changes that were randomly distributed about the mean of the better models.The differences between an all-model and skill-based ensemble in the projectedwarming are mainly <1.0◦C. However, at the scales of impacts assessment on human-health, ecosystems etc and given that this additional warming is identified merely byomitting those models with less capacity to simulate the observed, we suggest that itwould be misleading to use the lower projected warming derived from the all-modelensemble. Omitting the weaker models leads to considerable regional variations inthe amount of warming in TMAX

99 and TMIN0.3. More warming is projected by the

better models for TMAX99 in the southern third of Australia (mainly 0.1–1.0◦C) but

less warming is projected by the better models over large regions under the A2emission scenario (Fig. 5f, h) reaching 1.0–2.0◦C locally. The weaker models maska band of extremely high warming along the south coast of Australia. Over threeselected regions we also showed that a substantial increase in agreement in theprojected temperature changes occurred as weak models were omitted. This led to auseful reduction in the range of projections for most regions in TMAX and TMAX

99

(Fig. 6) and TMIN and TMAX0.3 (Fig. 9). However, in terms of the frequency of

projected occurrence of TMAX99, there was no reduction in the projected range in

any region (Fig. 7). Even the best models show very limited agreement in this statisticwhich is likely important in affecting many biological and human systems.

In terms of rainfall, our analysis of the more skilful AR4 models points to atendency for higher rainfall over large regions of Australia. The better models tendto simulate a larger increase (Fig. 4) over NSW, Queensland and the tropics. Thisincrease in rainfall is small but is significantly masked by weaker models. The A2scenario shows a very worrying change in P over Australia particularly by 2100 whichis not due to weak models. Christensen et al. (2007) show results for the A2 scenariofor 2100. Our equivalent skill-based model ensemble (Fig. 4g) shows a similar patternof changes to their figure 11.17. However, over most of Australia, the result presentedby Christensen et al. (2007) is uncertain since only about half of the AR4 models

Climatic Change (2009) 93:527–558 553

agree on whether rainfall will increase. Our analysis indicates that the projectionsfrom the all-model ensemble does not change a great deal as the weak models areomitted (suggesting that these models either simulate a change similar to the goodmodels or the net effect of several weak models is small). Thus, the percentagechange in rainfall shown by Christensen et al. (2007) is probably not biased by weakmodels. In particular, the drying trend over south west Western Australia is stronglyreproduced in the better models but likely extends to southern Victoria and SouthAustralia. This drying along the southern coastline contributes strongly to the largeincrease in TMAX

99 in these regions.The skill-based ensemble averages show continent-wide increasing amounts of

precipitation occurring at the P99 event (Fig. 10). The better models tend to simulatea larger increase in this statistic than the weaker models but the difference isrelatively minor. However, Fig. 16 shows a clear impact of increasing greenhousegases on the frequency of rainfall with almost all models showing an increase in rain-free days. The exception (IPSL) is one of the weaker models for Region 3. Thus, thesuggestion that global warming will lead to less frequent rainfall, but heavier rainfallwhen it occurs is born out in our findings. In most cases, when a model provides acontrary result (GISS AOM and GISS ER in Regions 2, 3, 11), Fig. 16 indicates thatthe model is among the weaker models.

We have explored why changes in TMAX99 are only marginally greater than the

change in TMAX. While there are suggestions that change in extremes under globalwarming would be greater than changes in the mean, many of these suggests stemfrom quite early work. Even the foundation work by Mearns et al. (1984, 1990)suggested that extremes may change more than indicated by a change in the mean ifboth the location and scale parameters of the distribution changed.

In terms of temperature, our results showed a higher change in TMAX99 than

TMAX, but the differences were generally small except where the changes in P weresignificant. To obtain a large increase in TMAX

99 relative to TMAX, the location andthe shape of the PDF has to change: if the distribution represented by the PDF merelyshifts relative to the x-axis, the change in the 99.7th percentile would be the same asthe change in the mean. We explored this by examining the change in the shapeof the simulated PDFs by subtracting the change in the mean between the 20thCentury results and the 2050/2100 results and re-calculating the 21st century PDFs.By removing the mean change between the two PDFs, shifts in the distribution areremoved, clarifying the influence of any change in the scale of the distribution. Wethen measured the change in the shape of the distribution and quantified this in thesame way as the skill score discussed earlier (i.e. a measure of overlap between thetwo PDFs).

Table 3 shows the amount of overlap between the 20th Century and future PDFsfor each model for Region 2 (we only look at Region 2 and TMAX as results for TMIN

and other regions are very similar). The overlap between the 20th Century PDF, anda PDF derived for 2050 or 2100 overlaps by at least 90% in all models, and Table 3shows the overlap between the various PDFs typically exceeds 95% once the meanchange is removed, indicating small changes in the shape of the PDFs by 2050 and2100. However, a 5%–10% change in the shape of the PDF, if isolated at the tails,could still generate important changes in temperature and/or rainfall extremes.

Christensen et al.’s (2007) assessment of the changes in temperature and rainfallare, according to our results, affected by the inclusion of demonstrably weak models.

554 Climatic Change (2009) 93:527–558

Table 3 Overlap statistics for each PDF between the 20th century PDF and each future PDF forRegion 2

Region 2 A2 2050 A2 2100 B1 2050 B1 2100

CGCM-h 0.98 0.98CGCM-l 0.96 0.94 0.95 0.96CSIRO 0.93 0.93 0.94 0.94GISS AOM 0.98FGOALS 0.95 0.94IPSL 0.95 0.94 0.96MIROC-h 0.96 0.96MIROC-m 0.97 0.96 0.97 0.97ECHO-G 0.95 0.95 0.97 0.95MRI 0.98 0.97 0.98 0.97

The future PDF has been modified to remove the mean change from the 20th century, thus thesestatistics measure a change in the shape, not of the location of the PDF

The changes in the assessment is, however, not dramatic and some of their statementscan now be supported more strongly than at the time of the AR4. They state:

All of Australia . . . [is] very likely to warm during this century . . . comparableoverall to the global mean warming. The warming is smaller in the south,especially in winter . . . Increased frequency of extreme high daily temperatures[will occur] in Australia . . . and [a] decrease in the frequency of cold extremesis very likely

We agree with this statement, with the caveat that while the mean warming is leastin the south and most in the north, the warming in the 99.7th percentile is clearlystrongest in the south. The feedback with declining rainfall was partially hidden byweak models and is dramatic in major population centres by 2050 under B1 and A2emission scenarios. We note that the amount of warming in the mean is likely more,while the amount of warming in the extremes is generally less than anticipated. Wealso note that there is extremely poor agreement amongst the better models in thechange in the frequency of hot days.

Precipitation is likely to decrease in Southern Australia in winter and spring.Precipitation is very likely to decrease in Southwestern Australia in winter . . .

Our results strongly suggest that a decline in rainfall is very likely along the south-ern coast from Western Australia to Victoria intensifying under higher emissions andfurther into the future. This decline extends northwards along the coast of WesternAustralia and New South Wales. This is confronting given Australia needs rainfall toincrease to offset enhanced evaporative drying projected in the models.

6.1 Changes in rainfall in Northern and Central Australia are uncertain

Our results suggest increasing rainfall over Northern and Central Australia. Theseincreases are masked by weak models and emerge strongly from the better models.

Climatic Change (2009) 93:527–558 555

6.2 Extremes of daily precipitation will very likely increase

We agree, but note that the scale of increase is small and remains uncertain in themodels. We see no evidence that the increase in daily precipitation extremes exceeds10 mm on the annual event except in the tropics. This may, of course, be due tosystematic weaknesses in the simulation of rainfall by models (Sun et al. 2007).

Overall, our analysis therefore provides some clarity to the confidence inChristensen et al.’s (2007) assessment. What does this mean for impacts modellersand users of climate model data? If impacts modellers use climate model data westrongly suggest they evaluate these models for their specific purposes. We stronglysuggest that as fine a timescale as possible is used as monthly averages can hideconsiderable problems. Perkins et al. (2007) showed that, over Australia, some ofthe AR4 models were very impressive while others were severely limited. Impactsmodellers should be aware of this and choose climate model results expressly fortheir individual purposes. Our results also highlight significant concerns in using all-model ensembles. We advocate the use of individual model results (ideally individualrealizations from a suite of skilful climate models, see Beaumont et al. 2007). Ifthis is impractical, then using multi-model ensembles where each member of theensemble has been chosen on the basis of ability to simulate variables of interestis recommended.

7 Conclusions

Systematic evaluation provides a foundation for our growing confidence in climatemodel projections. There is no one way to evaluate climate models and a rangeof model metrics are likely to be developed. Perkins et al.’s (2007) metric has anadvantage of measuring the ability of a climate model across the whole of a PDF andtherefore given some guidance of the ability of a model at the tails of the simulateddistribution. The quantitative nature of the PDF-based skill score also provides anobjective basis for omitting models from an ensemble. The use of daily data avoidsbiases being hidden through averaging.

Over regional Australia, we showed that omitting some of the AR4 climatemodels based on skill scores <0.8 tended to lead to an increase in the mean projectedwarming in TMAX and TMIN by up to an additional 1◦C by 2100. The omission of theweaker models led to a reduction in the amount of warming in some regions suggest-ing overall that regional-scale projections of warming can be affected via a systematicremoval of the weaker models in any ensemble. Removal of the weaker modelsled to large and coherent patterns of change in TMAX

99 and TMIN0.3. We showed

in a regional analysis that removing weaker models clarified the likely changesin future projections in many cases, and that the best models generally projectedless warming in TMAX

99 and TMIN0.3 than the all-model ensemble. However, the

amount of reduction in the warming was small relative to the projected temperatureincreases. We showed the change in the frequency of TMAX

99, demonstrating that theannual event could increase dramatically by 2100 under the A2 emission scenario.However, we also showed that the best models still simulated large differences inhow this statistic might change. Since we are focused on projections we cannot provethat our methodology is better, but we do not see the merit in including models that

556 Climatic Change (2009) 93:527–558

cannot capture the observed system well in multi-model ensembles particularly whenthese hide changes that are clear among the better models.

In terms of precipitation, removing the weaker models changed the projectedpatterns of change. Overall the impact of increasing greenhouse gases in rainfallamount were increases over the tropics and parts of the east coast and little change ordecreases over most coastal regions. The reduced rainfall over south-west WesternAustralia in the all-model ensemble was reproduced when only the best models wereincluded. We also showed that the increase in the annual precipitation event wasrobust if only the best models were included. Thus over most regions of Australia,increasing greenhouse gases appear to cause small increases in rainfall amount, smallincreases in the annual rainfall event, increases in the frequency of an event whichcurrently occurs annually and increases in the number of rain-free days occurringeach year.

In several cases we identify models that appear to project a change that isanomalous. For example, while most models simulate a decreasing frequency ofrain days, some AR4 models simulated an increased frequency. We show that theanomalous models often had a low skill score and we therefore provide a rationalefor either omitting these models or reducing the attention paid to then in subsequentanalyses.

Overall, by omitting models based on a quantitative skill score we clarify theresulting regional climate projections over Australia. There is no a priori reason whythis is necessarily true. It was not obvious that weak models should tend towardsover-predicting warming in the 99.7th percentile or under predicting the annualrainfall event. The differences in the projections achieved via omitting weak modelsare not substantial relative to the changes projected due to increased greenhousegases at a continental scale. However, at regional scales, using our projected changesin climate would likely provide a more reliable set of climate simulations and reducethose biases in ensembles generated by climate models that we have shown to beweak.

Acknowledgements We acknowledge the international modeling groups for providing their datafor analysis, the Program for Climate Model Diagnosis and Intercomparison (PCMDI) for collectingand archiving the model data, the JSC/CLIVAR Working Group on Coupled Modeling (WGCM)and their Coupled Model Intercomparison Project (CMIP) and Climate Simulation Panel fororganizing the model data analysis activity, and the IPCC WG1 TSU for technical support. The IPCCData Archive at Lawrence Livermore National Laboratory is supported by the Office of Science,U.S. Department of Energy. We also thank Silicon Graphics for help in porting Matlab.

References

Beaumont L, Pitman AJ, Hughes L, Poulsen M (2007) Within and between climate model variabilityin the simulation of the impact of climate change on butterfly species in Australia. Glob ChangBiol 13:1368–1385. doi:10.1111/j.1365-2486.2007.01357.x

Christensen JH, Hewitson B, Busuioc A, Chen A, Gao X, Held I, Jones R, Kwon W-T, LapriseR, Rueda VM, Mearns L, Menéndez CG, Räisänen J, Rinke A, Kolli RK, Sarr A, Whetton P(2007) Regional climate projections. In: Solomon S, Qin D, Manning M, Chen Z, Marquis M,Averyt KB, Tignor M, Miller HL (eds) Climate change 2007: the scientific basis. Contributionof working group I to the fourth assessment report of the intergovernmental panel on climatechange. Cambridge University Press, New York

Climatic Change (2009) 93:527–558 557

Colombo A, Etkin D, Karney B (1999) Climate variability and the frequency of extreme temperatureevents for nine sites across Canada: implications for power usage. J Climate 12:2490–2502

Easterling DR, Meehl GA, Parmesan C, Changnon SA, Karl TR, Mearns LO (2000) Climateextremes: observations, modeling, and impacts. Science 289:2068–2074

Griffiths GM, Chambers LE, Haylock MR, Manton MJ, Nicholls N, Baek H-J, Choi Y, Della-MartaPM, Gosai A, Iga B, Laurent V, Maitrepierre L, Nakamigawa H, Ouprasitwong N, Solofa D,Tahani L, Thuy DT, Tibig L, Trewin B, Vediapan K, Zhai P (2005) Change in mean temper-ature as a predictor of extreme temperature change in the Asia-Pacific region. Int J Clim 25:1301–1330

Hennessy KJ, Pittock AB (1995) Greenhouse warming and threshold temperature events in Victoria,Australia. Int J Climatol 15:591–612

Johns TC, Durman CF, Banks HT, Roberts MJ, Mclaren AJ, Ridley JK, Senior CA, Williams KD,Jones A, Rickard GJ, Cusack S, Ingram WJ, Crucifix M, Sexton DMH, Joshi MM, Dong B-W,Spencer H, Hill RSR, Gregory JM, Keen AB, Pardaens AK, Lowe JA, Bodas-Salcedo A, StarkS, Searl Y (2006) The New Hadley Centre climate model (HadGEM1): evaluation of coupledsimulations. J Climate 19:1327–1353

Katz R, Brown B (1992) Extreme events in a changing climate: variability is more important thanaverages. Clim Change 21:289–302

Kharin VV, Zwiers FW (2005) Estimating extremes in transient climate change simulations.J Climate 18:1156–1173

Knutti R, Meehl GA, Allen MR, Stainforth DA (2006) Constraining climate sensitivity from theseasonal cycle in surface temperature. J Climate 19:4224–4233

Luo Q, Jones RN, Williams M, Bryan B, Bellotti W (2005) Probabilistic distributions of regionalclimate change and their application in risk analysis of wheat production. Clim Res 29:41–52

Mearns LO, Katz R, Schneider S (1984) Extreme high-temperature events: changes in the probabil-ities with changes in mean temperature. J Appl Meteor 23:1601–1613

Mearns LO, Schneider AH, Thompson SL, McDaniel LR (1990) Analysis of climate variability ingeneral circulation models: comparison with observations and changes in variability in 2xCO2experiments. J Geophys Res 95:20469–20490

Meehl GA, Zwiers F, Evans J, Knutson T, Mearns LO, Whetton PH (2000) Trends in extremeweather and climate events: issues related to modeling extremes in projections of future climatechange. Bull Amer Meteor Soc 81:427–436

Murphy JM, Sexton DMH, Barnett DN, Jones GS, Webb MJ, Collins M, Stainforth DA (2004)Quantification of modeling uncertainties in a large ensemble of climate change simulations.Nature 430:768–772

Parkinson G (ed) (1986) Atlas of Australian resources, third series, vol 4. Climate, Commonwealthof Australia, Canberra, Australian, 60 pp

Perkins SE, Pitman AJ, Holbrook NJ, McAneney J (2007) Evaluation of the AR4 climate models’simulated daily maximum temperature, minimum temperature and precipitation over Australiausing probability density functions. J Climate 20:4356–4376

Piani C, Frame DJ, Stainforth DA, Allen MR (2005) Constraints on climate change from a multi-thousand member ensemble of simulations. Geophys Res Lett 32:Art No L23825

Pitman AJ, Stouffer RJ (2006) Abrupt change in climate and climate models. Hydrol Earth Syst Sci10:903–912

Randall D, Wood RA, Bony S, Colman R, Fichefet T, Fyfe J, Kattsov V, Pitman AJ, Shukla J,Srinivasan J, Stouffer RJ, Sumi A, Taylor K (2007) Climate models and their evaluation.In: Solomon S, Qin D, Manning M, Chen Z, Marquis M, Averyt KB, Tignor M, Miller HL(eds) Climate change 2007: the scientific basis. Contribution of working group I to the fourthassessment report of the intergovernmental panel on climate change. Cambridge UniversityPress, New York

Shukla J, DelSole T, Fennessy M, Kinter J, Paolino D (2006) Climate model fidelity and projectionsof climate change. Geophys Res Lett 33:L07702. doi:10.1029/2005GL025579

Solomon S, Qin D, Manning M, Chen Z, Marquis M, Averyt KB, Tignor M, Miller HL (eds) (2007)Climate change 2007: the scientific basis contribution of working group I to the fourth assessmentreport of the intergovernmental panel on climate change. Cambridge University Press, NewYork

Stainforth DA, Allen MR, Tredger ER, Smith LA (2007) Confidence, uncertainty and decision-support relevance in climate predictions. Philos Trans R Soc A 365:2145–2161. doi:10.1098/rsta.2007.2074

558 Climatic Change (2009) 93:527–558

Sun Y, Solomon S, Dai A, Portmann RW (2007) How often will it rain? J Climate 20:4801–4818.doi:10.1175/JCLI4263.1

Taylor KE (2001) Summarizing multiple aspects of model performance in a single diagram.J Geophys Res 106:7183–7192

Tebaldi C, Smith RL, Nychka D, Mearns LO (2005) Quantifying uncertainty in projections ofregional climate change: a Bayesian approach to the analysis of multimodel ensembles. J Climate18:1524–1540. doi:10.1175/JCLI3363.1

Trigo RM, García-Herrera R, Díaz J, Trigo IF (2005) How exceptional was the early August 2003heatwave in France? Geophys Res Lett 32:L10701. doi:101029/2005GL022410

Watterson IG (1996) Non-dimensional measures of climate model performance. Int J Clim 16:379–391

Yonetani T, Gordon HB (2001) Simulated changes in the frequency of extremes and regionalfeatures of seasonal/annual temperature and precipitation when atmospheric CO2 is doubled.J Climate 14:1765–1779