Towards an operational MODIS continuous field of percent tree cover algorithm: examples using AVHRR...

17
Towards an operational MODIS continuous field of percent tree cover algorithm: examples using AVHRR and MODIS data M.C. Hansen a, * , R.S. DeFries a,b , J.R.G. Townshend a,c , R. Sohlberg a , C. Dimiceli a , M. Carroll a a Department of Geography, University of Maryland, 2181 LeFrak Hall, College Park, MD 20742, USA b Earth System Science Interdisciplinary Center, University of Maryland, College Park, MD 20742, USA c Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742, USA Received 1 May 2001; received in revised form 21 February 2002; accepted 12 March 2002 Abstract The continuous fields Moderate Resolution Imaging Spectroradiometer (MODIS) land cover products are 500-m sub-pixel representations of basic vegetation characteristics including tree, herbaceous and bare ground cover. Our previous approach to deriving continuous fields used a linear mixture model based on spectral endmembers of forest, grassland and bare ground training. We present here a new approach for estimating percent tree cover employing continuous training data over the whole range of tree cover. The continuous training data set is derived by aggregating high-resolution tree cover to coarse scales and is used with multi-temporal metrics based on a full year of coarse resolution satellite data. A regression tree algorithm is used to predict the dependent variable of tree cover based on signatures from the multi- temporal metrics. The automated algorithm was tested globally using Advanced Very High Resolution Radiometer (AVHRR) data,as a full year of MODIS data has not yet been collected. A root mean square error (rmse) of 9.06% tree cover was found from the global training data set. Preliminary MODIS products are also presented, including a 250-m map of the lower 48 United States and 500-m maps of tree cover and leaf type for North America. Results show that the new approach used with MODIS data offers an improved characterization of land cover. D 2002 Elsevier Science Inc. All rights reserved. 1. Introduction Tree cover mapping has grown in importance as the need to quantify global tree stocks has increased. Tree cover is an important variable for modeling of global biogeochemical cycles and climate (Sellers et al., 1997; Townshend et al., 1994). Additionally, tree cover mapping has taken on increased importance in the policy arena. Quantifying carbon stocks has been deemed a necessity in global treaties regard- ing release and sequestration of carbon to and from the atmosphere (IGBP, 1998). The use of tree cover mapping in assessing the condition of global ecosystems is also important (Ayensu, Claasen, Collins, et al., 1999). In order to meet the needs of the users of such data, the remote sensing community has begun to promote the benefits of the synoptic, standardized view provided by satellite data (DeFries, Han- sen, Townshend, Janetos, & Loveland, 2000). One of the annual Moderate Resolution Imaging Spectroradiometer (MODIS) land cover products is the vegetation continuous fields layers. The layers include percent bare ground, herba- ceous and tree cover and, for tree cover, percent evergreen, deciduous, needleleaf and broadleaf. These maps have the potential to meet many of the needs of both the scientific and policy communities. This paper describes an improved methodology for deriving percent tree cover estimates over previous methodologies. The procedure is presented along with a global Advanced Very High Resolution Radiometer (AVHRR) application and two examples using MODIS data. Continuous fields of vegetation properties offer advan- tages over traditional discrete classifications. By depicting each pixel as a percent coverage, areas of heterogeneity are better represented. Discrete classes do not allow for the depiction of variability for spatially complex areas (DeFries, Field, Fung, et al., 1995). Many spatially complex areas occur because of anthropogenic land cover change. By using proportional estimates, sub-pixel cover can be mapped with the prospect of measuring change over time. Since the 0034-4257/02/$ - see front matter D 2002 Elsevier Science Inc. All rights reserved. PII:S0034-4257(02)00079-2 * Corresponding author. Tel.: +1-301-314-2585. E-mail address: [email protected] (M.C. Hansen). www.elsevier.com/locate/rse Remote Sensing of Environment 83 (2002) 303 – 319

Transcript of Towards an operational MODIS continuous field of percent tree cover algorithm: examples using AVHRR...

Towards an operational MODIS continuous field of percent tree cover

algorithm: examples using AVHRR and MODIS data

M.C. Hansen a,*, R.S. DeFries a,b, J.R.G. Townshend a,c,R. Sohlberg a, C. Dimiceli a, M. Carroll a

aDepartment of Geography, University of Maryland, 2181 LeFrak Hall, College Park, MD 20742, USAbEarth System Science Interdisciplinary Center, University of Maryland, College Park, MD 20742, USA

cInstitute for Advanced Computer Studies, University of Maryland, College Park, MD 20742, USA

Received 1 May 2001; received in revised form 21 February 2002; accepted 12 March 2002

Abstract

The continuous fields Moderate Resolution Imaging Spectroradiometer (MODIS) land cover products are 500-m sub-pixel representations

of basic vegetation characteristics including tree, herbaceous and bare ground cover. Our previous approach to deriving continuous fields

used a linear mixture model based on spectral endmembers of forest, grassland and bare ground training. We present here a new approach for

estimating percent tree cover employing continuous training data over the whole range of tree cover. The continuous training data set is

derived by aggregating high-resolution tree cover to coarse scales and is used with multi-temporal metrics based on a full year of coarse

resolution satellite data. A regression tree algorithm is used to predict the dependent variable of tree cover based on signatures from the multi-

temporal metrics. The automated algorithm was tested globally using Advanced Very High Resolution Radiometer (AVHRR) data, as a full

year of MODIS data has not yet been collected. A root mean square error (rmse) of 9.06% tree cover was found from the global training data

set. Preliminary MODIS products are also presented, including a 250-m map of the lower 48 United States and 500-m maps of tree cover and

leaf type for North America. Results show that the new approach used with MODIS data offers an improved characterization of land cover.

D 2002 Elsevier Science Inc. All rights reserved.

1. Introduction

Tree cover mapping has grown in importance as the need

to quantify global tree stocks has increased. Tree cover is an

important variable for modeling of global biogeochemical

cycles and climate (Sellers et al., 1997; Townshend et al.,

1994). Additionally, tree cover mapping has taken on

increased importance in the policy arena. Quantifying carbon

stocks has been deemed a necessity in global treaties regard-

ing release and sequestration of carbon to and from the

atmosphere (IGBP, 1998). The use of tree cover mapping

in assessing the condition of global ecosystems is also

important (Ayensu, Claasen, Collins, et al., 1999). In order

to meet the needs of the users of such data, the remote sensing

community has begun to promote the benefits of the synoptic,

standardized view provided by satellite data (DeFries, Han-

sen, Townshend, Janetos, & Loveland, 2000). One of the

annual Moderate Resolution Imaging Spectroradiometer

(MODIS) land cover products is the vegetation continuous

fields layers. The layers include percent bare ground, herba-

ceous and tree cover and, for tree cover, percent evergreen,

deciduous, needleleaf and broadleaf. These maps have the

potential to meet many of the needs of both the scientific and

policy communities. This paper describes an improved

methodology for deriving percent tree cover estimates over

previous methodologies. The procedure is presented along

with a global Advanced Very High Resolution Radiometer

(AVHRR) application and two examples using MODIS data.

Continuous fields of vegetation properties offer advan-

tages over traditional discrete classifications. By depicting

each pixel as a percent coverage, areas of heterogeneity are

better represented. Discrete classes do not allow for the

depiction of variability for spatially complex areas (DeFries,

Field, Fung, et al., 1995). Many spatially complex areas

occur because of anthropogenic land cover change. By

using proportional estimates, sub-pixel cover can be mapped

with the prospect of measuring change over time. Since the

0034-4257/02/$ - see front matter D 2002 Elsevier Science Inc. All rights reserved.

PII: S0034 -4257 (02 )00079 -2

* Corresponding author. Tel.: +1-301-314-2585.

E-mail address: [email protected] (M.C. Hansen).

www.elsevier.com/locate/rse

Remote Sensing of Environment 83 (2002) 303–319

scale of human-induced land cover change is typically finer

than 250-m (Townshend & Justice, 1988), continuous fields

from MODIS data may yield a usable land cover change

product.

2. Procedure

The approach presented in this paper for mapping con-

tinuous fields of tree cover differs from that of the initial

prototype (DeFries et al., 2000). Fig. 1 outlines the proto-

type methodology and the improved technique presented

here. The two approaches share one feature: the use of

annual phenological metrics as the independent variables to

predict tree cover. They differ in the following ways:

n the new technique is fully automated

n the new training data set is a continuous variable, not

discrete class labels

n the new algorithm is a regression tree as opposed to a

linear mixture model modified by a land cover

classification

n the new approach operates globally, without per continent

adjustments of the mixture model.

The most important advancement is the automation of

the algorithm. The prototype approach relied on a classi-

fication methodology which was partially dependent on an

expert interpreter’s input (Hansen, DeFries, Townshend, &

Sohlberg, 2000). This step has been eliminated in the new

technique. The main parts integral to the methodology are

described in the following sections.

2.1. Annual metrics

Global multi-temporal metrics capture the salient points

of phenological variation by calculating annual means,

maxima, minima and amplitudes of spectral information.

The value of metric generation versus using a series of

monthly values is that the metrics are not sensitive to time of

year or the seasonal cycle and can limit the inclusion of

atmospheric contamination. Fig. 2 shows monthly values for

red reflectance from AVHRR data for February 1995 to

January 1996 for the Amazon basin. Use of any individual

month would include cloud contamination whereas the

annual minimum provides a cleaner metric for viewing land

cover.

Fig. 3 shows another example of the utility of metrics

from Central Africa. Here, the maximum annual Normalized

Fig. 1. Flow chart of major steps in generation of global continuous field of tree cover products for (a) prototype methodology of DeFries et al. (2000) and (b)

MODIS implementation.

Fig. 2. Derived minimum annual red reflectance from monthly composites of red reflectance associated with maximum monthly NDVI for (a) January 1996, (b)

February 1995, (c) March 1995, (d) April 1995, (e) May 1995, (f) June 1995, (g) July 1995, (h) August 1995, (i) September 1995, (j) October 1995, (k)

November 1995, (l) December 1995. (m) is derived metric. All 13 subsets have the same image enhancement applied.

M.C. Hansen et al. / Remote Sensing of Environment 83 (2002) 303–319304

M.C. Hansen et al. / Remote Sensing of Environment 83 (2002) 303–319 305

Fig. 3. (a) AVHRR metrics for area in central Africa: red =maximum annual NDVI, cyan =minimum annual red reflectance; (b) AVHRR metric of mean

temperature of the four warmest months from band 5; (c) continuous tree cover result; (d) high-resolution imagery, false color composite for an area in the

Democratic Republic of the Congo; (e) classified high-resolution imagery: green = forest (80% canopy cover), dark maroon =woodland (50% canopy cover),

light maroon = parkland (25% canopy cover), yellow= no trees (0% canopy cover); (f) derived training data by aggregating classified image to 500-m pixels.

M.C. Hansen et al. / Remote Sensing of Environment 83 (2002) 303–319306

Difference Vegetation Index (NDVI) is shown with the

minimum annual red reflectance metric. Minimum annual

red reflectance is negatively correlated with tree cover as the

combined effects of chlorophyll absorption and canopy

shadowing make denser tree cover darker. Maximum annual

NDVI, on the other hand, has a positive correlation with tree

cover as increasing leaf area of canopies makes forests

appear greener. However, for this area, woodlands of

approximately 60% cover are indistinguishable from denser

forests for these metrics. Another metric based on surface

temperature allows for the stratification of these two areas

using the regression tree. The four warmest months of the

year based on surface temperature correlate with the dry

season as the seasonal woodlands have senesced and evap-

otranspiration is lower: this allows for a clean delineation of

the forest/woodland boundary. These metrics also discrim-

inate the northern edge of the Central African rainforest as

they are insensitive to the specific time of year. Metric

generation will continue to develop using MODIS data as a

full year of consistent data becomes available and the full

global suite of metrics can be derived.

The metrics to be tested will mimic those for this work

shown in Table 1. Each band is ranked individually and also

ordered by corresponding greenness and temperature rank-

ings. The individual bands, NDVI and surface temperature

are ranked; lowest to highest for visible and infrared bands,

highest to lowest for NDVI and surface temperature. From

these rankings a set of metrics is derived. The bands are also

ordered according to highest and lowest corresponding

NDVI and surface temperature values, and metrics are

derived based on these orderings. Metrics results such as

near-infrared reflectance at maximum annual NDVI, or

mean NDVI of the four warmest surface temperature

months are used. Table 1 shows metrics for an example

using a red reflectance band.

2.2. Continuous training data

Past training data were created by classifying and inter-

preting high-resolution imagery to identify homogeneous

areas. These areas were then aggregated to develop a coarse

resolution training data set for a discrete classification

system, the modified International Geosphere Biosphere

Programme’s (IGBP) University of Maryland land cover

legend (DeFries, Hansen, Townshend, & Sohlberg, 1998;

Hansen et al., 2000). The 12 classes in this legend can be

aggregated to four tree cover strata. These strata are 0–10%,

11–40%, 41–60% and 61–100% tree canopy cover. In the

new approach, the high resolution classifications are aggre-

gated to coarser scales by labeling each stratum with a mean

cover value (0%, 25%, 50% and 80% for the aforemen-

tioned classes) and then averaging over the coarser output

cells. In this way a continuous tree cover training data set is

created. Fig. 3 shows the approach for deriving the current

global training data set for an example from the Democratic

Republic of the Congo.

Thus, the new approach includes the use of training

pixels of intermediate cover, whether they are homogeneous

open woodlands or fragmented forest. This is an improve-

ment over spectral end members, which employ only

signatures characteristic of pure class types. As prior work

was based on identifying core, homogeneous areas for all

cover classes, a new training data set had to be assembled.

The archival data sets were re-interpreted wall-to-wall,

where possible, to acquire training in mixed areas. This

allows for a more consistent depiction of transition areas and

ecotones which are of interest to many researchers of land

cover change. An important effect of the continuous training

is the increased ability to automate the procedure. By having

the full range of tree cover heterogeneity for training, the

algorithm produces more stable results.

2.3. Regression tree algorithm

Regression trees have previously been used with remote

sensing data (DeFries et al., 1997; Michaelson, Schimel,

Friedl, Davis, & Dubayah, 1994; Prince & Steininger,

1999). They offer a robust tool for handling nonlinear

relationships within remotely sensed data sets. The algo-

rithm uses a set of independent variables, in this case annual

multi-temporal metrics, to recursively split a dependent

variable, in this case tree cover, into subsets which max-

imize the reduction in the residual sum of squares. The

algorithm uses only those metrics which best separate the

Table 1

This table shows examples of metrics derived for the red reflectance band

Ranking criteria: Each band is individually ranked and also ordered based on NDVI and surface temperature rankings

Ranking of individual bands Greenest based on NDVI Warmest based on surface temperature

Metric

types

Individual

monthly values

minimum, median and maximum

annual red reflectance

red reflectance associated with peak,

median, minimum greenness

red reflectance associated with peak,

median and minimum surface temperature

Means mean of four, six and eight darkest

red reflectance monthly values

mean red reflectance of four,

six and eight greenest months

mean red reflectance of four, six

and eight warmest months

Amplitudes amplitude of red reflectance for

minimum, median and maximum

red values

amplitude of red reflectance

associated with peak, median,

minimum greenness

amplitude of red reflectance associated

with peak, median, minimum surface

temperature

The same metrics are calculated for other bands and NDVI. For AVHRR, bands 1–5 were used; for MODIS, bands 1–7 and surface temperature will be used.

M.C. Hansen et al. / Remote Sensing of Environment 83 (2002) 303–319 307

tree strata. In this way, unlike unsupervised classifiers,

metrics that provide no discriminatory information are

ignored. For example, the individual months of Fig. 2

may not be used at all, since the derived index of minimum

red reflectance best depicts tree cover information.

All input metrics are analyzed across digital number

values and right and left splits are examined. The split that

produces the greatest reduction in the residual sum of

squares, or deviance, is used to divide the data and the

process begins again for the two newly created subsets. The

regression tree algorithm takes the following form:

D ¼ Ds � Dt � Du

where s represents the parent node, and t and u are the splits

from s. The deviance for nodes is calculated from the

equation:

Di ¼X

casesðjÞðyi � u½j�Þ2

for all j cases of y and the mean value of those cases, u.

Our implementation of the regression tree algorithm is

performed as follows. Two samples of training pixels are

taken from the training data set. One is used to grow the

regression tree and one to prune it. Pruning is required

because tree algorithms are very robust and delineate even

Fig. 4. Example of tree cover mapping methodology. (a) Scatter of 1999 8-km global tree cover training data where the feature space is minimum annual red

reflectance on the y-axis and minimum annual near-infrared reflectance on the x-axis with derived NDVI from these two values also used; (b) node partitions

and node numbers derived from the pruned regression tree; (c) mean node estimates resulting from the regression tree; (d) per node stepwise regression

estimates; (e) per node median adjustment results. In addition to slightly improving the root mean square error estimates, the last two steps in (d) and (e) create

a more continuous result and improve depictions in extreme low and high cover nodes. Refer to Fig. 5 to see the actual tree structure.

M.C. Hansen et al. / Remote Sensing of Environment 83 (2002) 303–319308

individual pixels isolated in spectral space. By having a set-

aside of training data, a more generalized tree can be

generated. This generalization is achieved by passing the

second sample of data down the initial tree. As the data

cascade down the tree, the overall sum of squares begins to

level out and eventually begins to increase. This indicates

an overfitting of the initial tree. For this work, pruning is

performed not where the sum of squares begins to increase,

but where additional nodes represent a reduction of less

than 0.01% of the overall sum of squares for the data. The

end result is an easily interpreted hierarchy of splits, which,

when followed, allow for a ready biophysical interpretation

of the relationship between vegetation cover and satellite

signal.

An additional step is the fitting of a linear regression

model to the data in each node. The regression tree output

yields a mean cover value based on training pixels present

in each node. However, the predicted values can be

improved by running a linear model using the independent

variables to predict tree cover for each node. This is done

by using a stepwise regression procedure per node in order

to use the combination of image data which best explains

tree cover variation. This step represents a fine-tuning of the

result to produce a more continuous product and does not

greatly change the regression tree results. For example,

from Fig. 3, the regression tree might use the temperature

metric to separate the forest from the woodlands. Then

metrics such as maximum annual NDVI would be used in

the stepwise regression phase to improve the mean node

estimates.

Many nodes at the extremes of tree cover extent have

skewed data distributions. While the regression tree yields

suitable splits in these instances, the use of the mean value

in assigning a cover value may reduce values at the high

cover end and increase values for extremely low cover

Fig. 5. Tree structure from Fig. 4, which employs 1999 minimum red and near-infrared reflectances and derived NDVI for 8-km Pathfinder AVHRR data.

Training data are resampled from the high-resolution classifications to the 8-km grid. Ellipses represent nonterminal nodes; rectangles, terminal nodes. Inside

nodes are mean tree cover estimates based on 50% sample used to grow tree. Splitting rules are shown under nonterminal nodes. Terminal node numbers match

those in Fig. 4b.

Table 2

Node statistics for example tree in Figs. 4 and 5

Node Training

mean

Standard

deviation

Median Number of

pixels

1 42.0 20.5 43 32

2 58.1 15.1 62 282

3 63.5 16.1 65 46

4 68.5 8.9 70 1543

5 11.1 7.6 10 15

6 37.1 15.9 27 190

7 26.7 11.5 27 337

8 45.2 13.7 42 110

9 55.5 11.1 53 337

10 42.7 10.4 42 228

11 37.1 8.8 39 269

12 17.8 9.5 14 256

13 30.0 8.9 34 218

14 21.6 9.2 26 649

15 13.1 7.3 10 889

16 0.4 2.0 0 9001

17 8.7 6.1 9 1604

18 5.4 5.5 2 1095

M.C. Hansen et al. / Remote Sensing of Environment 83 (2002) 303–319 309

Fig. 6. (a) Percent tree cover map automatically generated using global 1-km AVHRR data from 1995–96 data and (b) subset of preliminary linear endmember

mixture model approach for an area of New York state; (c) same area for new approach; (d) preliminary approach for an area in Mato Grosso state, Brazil; (e)

same area for new approach.

M.C. Hansen et al. / Remote Sensing of Environment 83 (2002) 303–319310

nodes. A simple solution to this is to adjust the final node

values by adding the median minus the mean for each node.

Again, this represents a subtle adjustment to the final

product, but experiments with the procedure show that it

slightly improves overall root mean square errors and high

and low end cover estimates.

Fig. 4 shows a graphic representation of the procedure.

This example uses actual inputs, but is a simplified illus-

tration to aid understanding of the procedure. Three input

metrics, minimum annual red and near infrared reflectances

and derived NDVI from 1999 AVHRR data, are used as the

independent variables. The training data are from the global

training set aggregated to 8-km resolution. The 50% sample

used to grow the tree created a 2954 node tree when

perfectly fit to the scatter in Fig. 4a. Using the other 50%

of data to prune and find the 0.01% cutoff threshold, an 18-

node tree is derived as shown in Figs. 4b and 5. The overall

mean of the training data is 14.2% tree cover as can be seen

in the root node in Fig. 5. Using this estimate for all pixels

yields a root mean square error (rmse) of 17.73%. The mean

estimates from the 18 nodes reduce the rmse to 3.43%. The

next steps of stepwise regression and median adjustment

lower this value to 3.35% and 3.31%, respectively. Thus, the

most significant predictor is the original pruned tree itself,

while the subsequent steps create a more continuous and

slightly improved result.

The tree structure and associated node statistics are

informative since trees allow for meaningful interpretation

from a biophysical perspective. The first three splits in the

tree use red reflectance, indicating the importance of this

metric in tree mapping. The combined effects of chlor-

ophyll absorption and canopy shadowing in the visible red

wavelengths are most significant among these variables in

discriminating dense tree cover. Node 5 is an example of a

low tree cover node which could be associated with burns

as it has both very low red reflectance and NDVI. Table 2

shows statistics for each node. Note that the mean node

values are slightly different than those of the tree in Fig. 5,

because the tree is originally defined using a 50% sample

whereas the Table 2 statistics include all pixels. In this

table, nodes with great variability represent inseparable

signatures. Increasing the feature space by adding metrics

might be required in this instance to enhance separability.

An arc of increased inseparability is seen across the feature

space for nodes 1, 2, 3, 6, 7, 8, 9 and 10. This type of

information is useful, especially for change detection

studies because it allows for an assignment of confidence

which can be employed to measure change. For instance,

given two successive time periods and similar tree struc-

tures, only pixels which started and ended in the high

confidence zones above and below this low confidence arc

would be labeled as changed pixels. Only node 6 exhibits

a significant degree of skewing. The mean and median are

fully 10% apart. This node represents a bimodal distribu-

tion which is inseparable and best estimated by adjusting

node values using the median.

3. Results

3.1. AVHRR global prototype using MODIS algorithm

The initial attempt to use the regression tree was per-

formed using the AVHRR 1-km data set processed at the

EROS Data Center under the guidance of the IGBP (Eiden-

shink & Faudeen, 1994). Metrics describing the phenolog-

ical variation of vegetation were derived for the year dating

February 1995 to January 1996. This test employed 144

metrics, many derivative of those used in the land cover

classification of Hansen et al. (2000). Table 1 shows an

outline of the metrics used. At 1-km resolution, the training

data consists of nearly 6 million pixels, and a systematic

sampling of roughly every fifth training pixel was taken to

drive the analysis. The final product and improved informa-

tion content in the algorithm can be seen in Fig. 6. A much

more detailed, sharper depiction is shown for subsets

centered on the Hudson River valley, United States and

the upper Xingu River valley, Brazil as compared to the

initial methodology. The previous methodology using end-

members in a linear model tends to overestimate forest

cover at the high end. This is due to the small dynamic range

of dense tree cover (f >40%) for many metrics, such as the

red reflectance metric shown in Fig. 4. The linear model

tends to flatten tree cover variability, which is captured in

the regression tree approach.

The initial regression tree mean cover values for 189,092

pixels yielded an rmse of 9.28 compared to the training data.

After applying the regression models to each node, the rmse

was reduced to 9.06% tree cover. The final scaling using the

median adjustment also resulted in an rmse of 9.06%.

Comparison of the training values to results for both

methodologies are listed in Table 3. The average rmse

values indicate a more robust result across all strata with

the new algorithm.

3.2. Conterminous United States 250-m tree cover map from

2000 summer and fall maximum NDVI composites

To test the procedure further and to examine the robust-

ness of the MODIS data, a preliminary United States tree

Table 3

Comparison of global continuous training pixel values with results from

two approaches depicting tree cover, the linear mixture approach of DeFries

et al. (2000) and the regression tree approach planned for use with MODIS

data

Tree cover

strata

Linear mixture model

+ classification (%)

Regression tree

algorithm (%)

0–10 5.5 4.37

11–25 16.9 11.9

26–40 18.3 13.4

41–60 15.8 13.8

61–100 9.4 10.3

average rmse 13.8 10.8

overall rmse 10.6 9.1

M.C. Hansen et al. / Remote Sensing of Environment 83 (2002) 303–319 311

Fig. 7. (a) Continuous tree cover training at 250-m resolution used to create test map. (b) Test product of tree cover for the conterminous United States from two

maximum NDVI composites from data between June 10 and July 27, 2000 and between October 7 and October 31, 2000.

M.C. Hansen et al. / Remote Sensing of Environment 83 (2002) 303–319312

cover map was made using two maximum NDVI compo-

sites from available summer and fall data for the year 2000.

The high-resolution training data resampled to the 250-m

MODIS cell size resulted in over 20 million training pixels

for the contiguous United States alone. The 250-m training

data are shown in Fig. 7. A 1% sample of these sites was

randomly taken.

The 250-m bands were chosen to be included in the

MODIS sensor as Townshend and Justice (1988) found this

to be the resolution necessary to depict human-induced land

cover change. It is clear from much of the MODIS 250-m

raw imagery that this was a useful choice. When viewing

raw swaths, many forest clearings and other features asso-

ciated with human activity are plainly visible. However,

when comparing the raw inputs to a maximum NDVI

composite, it is clear that a lot of this information is lost.

Fig. 8 shows NDVI data from the MODIS 250-m bands.

The raw swath has a great amount of detail present, which is

lost or blurred in the autumn composited image used to

make the country-wide product. Small clearings and water

courses in the Congaree bottomland hardwood forest, which

appears as the bright fork shape in the center of the images,

are plainly visible in the L1B data, but not in the composite.

This composite is not an official MODIS product (Huete

et al., 2002, this issue), but a simple test to observe the

quality of a traditional procedure. It is possible that the

blurring is related to geolocation errors or the inclusion of

extreme view angle values, which may be easily corrected.

However, it is apparent that compositing issues are critical

to maximizing the usefulness of MODIS data. In past work,

the AVHRR sensor’s resolution of 1.1 km did not allow for

the depiction of such detail and the effects of compositing,

while well-characterized by many, (Cihlar, Manak, &

D’Iorio, 1994; Holben, 1986; Moody & Strahler, 1994),

did not appear to result in such a potential dramatic loss of

information. That is because the original resolution and

sensor characteristics of the AVHRR captured an image

which was too coarse to view many of the features which

are visible with MODIS. Compositing is now of increased

importance, as blurring of the data can preclude the useful-

ness of the data in change detection studies.

3.3. North America 500-m tree cover and leaf type products

The operational MODIS algorithm was implemented on

4 months of 500-m data (Julian days 305–337 for 2000 and

81–153 of 2001) for North America. This is the resolution

of the official MODIS continuous cover products. The time

periods used capture some seasonality, but are not sufficient

temporally to derive useful metrics. A consistently pro-

cessed year of data for metric generation was not available

at the time of this study. However, the results of this

preliminary product reveal the robustness of the MODIS

data. The data were compiled into 40-day composites and

the training data binned to the 500-m MODIS Integerized

Sinusoidal grid. The 500-m data were sampled in a similar

Fig. 8. (a) Maximum NDVI composite from October 2000 composite of tiled MODIS 250-m data for an area in South Carolina. Columbia is at left, center of

the image. (b) NDVI derived from raw level 1B data for October 12, 2000 level 1B 250.

M.C. Hansen et al. / Remote Sensing of Environment 83 (2002) 303–319 313

Fig. 9. Preliminary 500-m MODIS percent tree cover map for North America.

M.C.Hansen

etal./Rem

ote

Sensin

gofEnviro

nment83(2002)303–319

314

Fig. 10. Preliminary 500-m MODIS percent tree leaf type for North America.

M.C.Hansen

etal./Rem

ote

Sensin

gofEnviro

nment83(2002)303–319

315

Fig. 11. (a) Per state thresholds at which the area estimate of the 500-m tree cover map matches United States Forest Service estimates. This value is found per

state by starting at the highest percent tree cover values in the 500-m map and calculating area totals as the tree cover threshold is lowered. For the 500-m map,

the area of tree cover greater than or equal to the threshold value shown yields the same area as estimated by the USFS. (b) Application of weighted mean

threshold (35% tree cover) which yields an areal match with the Forest Service data for the lower 48 United States. Gray is tree cover greater than or equal to

35%; black is less than 35%.

M.C. Hansen et al. / Remote Sensing of Environment 83 (2002) 303–319316

Fig. 12. Regional comparisons of threshold matches between 500-m continuous tree cover map and United States Forest Service estimates.

M.C. Hansen et al. / Remote Sensing of Environment 83 (2002) 303–319 317

fashion to the 1-km AVHRR by taking every tenth pixel to

reduce data volumes. A final tree of 90 nodes was created

from the 24 input channels (bands 1–7 and NDVI for three

40-day composites). The initial node estimates yielded an

rmse for the 82,082 training pixels of 11.07% tree cover

which was reduced to 10.32% and 9.93% after regression

and median refinements. The result is shown in Fig. 9.

The same procedure was followed for tree leaf type,

resulting in a map of percent needleleaf and broadleaf tree

cover. For training sites with greater than 10% tree cover,

the percent contribution of broadleaf tree cover was used as

training. This yielded 48,105 training pixels. The procedure

was followed as before and the percent needleleaf calculated

by taking the difference of the percent total tree cover less

the product of the percent broadleaf and percent tree cover.

The result is shown in Fig. 10. The subsets in both Figs. 9

and 10 show the increased detail available with MODIS

compared to AVHRR.

4. Evaluation of preliminary 500-m tree cover for lower

48 United States

The 500-m tree cover map was compared to United States

Forest Service (USFS) statistics for the lower 48 United

States (Powell, Faulkner, Darr, Zhu, & MacCleery, 1992).

Beginning with the densest forest stratum and lowering the

continuous field threshold, a cutoff can be found for which

the forest area estimate of the USFS can be matched. Fig. 11

shows for each state which continuous field threshold yields

an equivalent areal estimate. A mean weighted by USFS state

area estimates was derived, which results in a match for total

forest area for the lower 48 states. A threshold of 35% results

in a total of 2.35 million km2 compared to the USFS estimate

of 2.42 million km2. The Forest Service definition of forest is

land at least 10% stocked by trees of any size (Powell et al.,

1992), but also includes areas formerly with tree cover with

plans to be afforested. Fig. 11 also shows the resulting forest/

nonforest map after applying this threshold to the continuous

field map. States in Fig. 11a with thresholds below and above

this cutoff will, respectively, under- and overestimate the

USFS figures.

There are many regional differences in terms of which

threshold best matches the USFS state areas totals. Fig. 12

shows these findings. For example, the intermountain west,

centered on desert southwest states, has the lowest matching

thresholds of any region. A clear reason for this is the

inclusion of shorter stands of woody cover as forest in the

USFS forest definition. Pygmy pinyon forests, chaparral and

shorter oak scrub are labeled forest in the USFS definition

(Powell et al., 1992). The continuous field implementation

uses a definition of tree as any woody plant in excess of 5 m

in height. Much of the moisture limited woody cover found

in the western United States does not meet this definition. A

continuous training data set for short woody vegetation is

being developed to augment the tree cover layer.

The corn belt is not a traditional regional subset like the

other regions, but is included here due to the consistently

low threshold found for the dominant corn producing

states. This could be the result of an increased fragmenta-

tion of forest in this area and a confusion in spectral space

between crops and sub-pixel forest which is biased toward

crops. The rest of the Midwest and Great Plains states have

great consistency in a threshold of at or near 36%. As one

trends east the thresholds increase with the highest match-

ing thresholds being the heavily forested south and north-

east.

These results show that the algorithm is producing

consistent results which compare well with the USFS

statistical database. Such results should be repeatable and

allow for developing thresholds of change detection for

monitoring purposes. This would help augment the labor-

intensive approach to forest area estimation employed by

the USFS. However, calculating area totals can be compli-

cated by fragmentation, as a pixel with half of its area in

100% tree cover will yield the same cover area estimate as a

uniform, homogeneous 50% woodland pixel. Fragmentation

could be developed as an ancillary layer in improving area

estimates at the sub-pixel level.

5. Conclusion

The new procedure for depicting a continuous field of

tree cover is an improvement over the prototype approach.

The main advance is that the algorithm is fully automated.

All of the products here were generated using the new

technique and do not include an interpreter’s input. The

continuous field training data have been critical to this

advance by containing signatures across a wide range of

spatial and spectral mixtures. The algorithm is made more

stable in this way as signatures are not derived from only

core cover exemplar sites. The regression tree algorithm is

an advance as well, in that it can handle the nonlinear

relationships present in a global sample of tree cover.

Present work for the 500-m MODIS continuous field layers

includes creating the annual metrics and producing global

tree cover, leaf type and leaf longevity layers. The examples

shown here indicate that MODIS data will be a substantial

improvement over AVHRR in mapping tree cover. The

spatial detail present in MODIS imagery is unprecedented

for satellites of this kind. However, preserving the finest

spatial detail within the compositing process might require

new approaches.

Acknowledgements

This research was funded by the National Aeronautics

and Space Administration under contract NAS596060, grant

NAG59339, and the Earth Science Information Partnership

(ESIP) program under grant NCC5300.

M.C. Hansen et al. / Remote Sensing of Environment 83 (2002) 303–319318

References

Ayensu, E., Claasen, D., Collins, M., et al. (1999). International ecosystem

assessment. Science, 286, 685–686.

Cihlar, J., Manak, D., & D’Iorio, M. (1994). Evaluation of compositing

algorithms for AVHRR data over land. IEEE Transactions on Geosci-

ence and Remote Sensing, 32, 427–437.

DeFries, R. S., Field, C. B., Fung, I., et al. (1995). Mapping the land surface

for global atmosphere–biosphere models: towards continuous distribu-

tions of vegetation’s functional properties. Journal of Geophysical Re-

search, 100, 867–920.

DeFries, R. S., Hansen, M., Steininger, M., Dubayah, R., Sohlberg, R., &

Townshend, J. (1997). Subpixel forest cover in Central Africa from

multisensor, multitemporal data. Remote Sensing of Environment, 60,

228–246.

DeFries, R. S., Hansen, M. C., Townshend, J. R. G., Janetos, A. C., &

Loveland, T. R. (2000). A new global 1-km dataset of percentage

tree cover derived from remote sensing. Global Change Biology, 6,

247–254.

DeFries, R. S., Hansen, M. C., Townshend, J. R. G., & Sohlberg, R. S.

(1998). Global land cover classifications at 8 km spatial resolution: the

use of training data derived from Landsat imagery in decision tree

classifiers. International Journal of Remote Sensing, 19, 3141–3168.

Eidenshink, J. C., & Faudeen, J. L. (1994). The 1 km AVHRR global land

data set: first stages in implementation. International Journal of Remote

Sensing, 15, 3443–3462.

Hansen, M. C., DeFries, R. S., Townshend, J. R. G., & Sohlberg, R. (2000).

Global land cover classification at 1 km spatial resolution using a clas-

sification tree approach. International Journal of Remote Sensing, 21,

1331–1364.

Holben, B. N. (1986). Characteristics of maximum-value composite images

from temporal AVHRR data. International Journal of Remote Sensing,

12, 1147–1163.

Huete, A., Didan, K., Miura, T., Rodriguez, E. P., Gao, X., & Ferreira, L. G.

(2002). Overview of the Radiometric and Biophysical Performance of

the MODIS Vegetation Indices. Remote Sensing of Environment, 83,

195–213 (this issue).

IGBP Terrestrial Carbon Working Group (1998). The terrestrial carbon

cycle: implications for the Kyoto Protocol. Science, 280, 1393–1394.

Michaelson, J., Schimel, D. S., Friedl, M. A., Davis, F. W., & Dubayah, R.

O. (1994). Regression tree analysis of satellite and terrain data to guide

vegetation sampling and surveys. Journal of Vegetation Science, 5,

673–696.

Moody, A., & Strahler, A. H. (1994). Characteristics of composited

AVHRR data and problems in their classification. International Journal

of Remote Sensing, 15, 3473–3491.

Powell, D. S., Faulkner, J. L., Darr, D. R., Zhu, Z., & MacCleery, D. W.

(1992). Forest Resources of the United States, 1992. General Technical

Report RM-234. Washington, DC: United States Department of Agri-

culture, Forest Service.

Prince, S. D., & Steininger, M. K. (1999). Biophysical stratification of the

Amazon basin. Global Change Biology, 5, 1–22.

Sellers, P. J., Dickinson, R. E., Randall, D. A., Betts, A. K., Hall, F. G.,

Mooney, H. A., Nobre, C. A., Sato, N., Field, C. B., & Henderson-

Sellers, A. (1997). Modeling the exchanges of energy, water and carbon

between continents and the atmosphere. Science, 275, 502–509.

Townshend, J. R. G., & Justice, C. O. (1988). Selecting the spatial reso-

lution of satellite sensors required for global monitoring of land trans-

formations. International Journal of Remote Sensing, 9, 187–236.

Townshend, J. R. G., Justice, C. O., Skole, D., Malingreau, J.-P., Cihlar, J.,

Teillet, P., Sadowski, F., & Ruttenberg, S. (1994). The 1 km resolution

global data set: needs of the International Geosphere–Biosphere Pro-

gramme. International Journal of Remote Sensing, 17, 231–255.

M.C. Hansen et al. / Remote Sensing of Environment 83 (2002) 303–319 319