Portadores de CDI com limiar de desfibrilação elevado: evolução clínica e alternativas terapêuticas
Quantifying disturbances in benthic communities—comparison of the community disturbance index...
-
Upload
independent -
Category
Documents
-
view
2 -
download
0
Transcript of Quantifying disturbances in benthic communities—comparison of the community disturbance index...
This article is also available online at:www.elsevier.com/locate/ecolind
Ecological Indicators 7 (2007) 254–276
Quantifying disturbances in benthic communities—comparison
of the community disturbance index (CDI)
to other multivariate methods
Geir Rune Flaten a,*, Helge Botnen b, Bjørn Grung a, Olav M. Kvalheim a
a Department of Chemistry, University of Bergen, Allegaten 41, N-5007 Bergen, Norwayb Section of Applied Environmental Research, Høyteknologisenteret, N-5020 Bergen, Norway
Received 30 March 2005; received in revised form 3 February 2006; accepted 5 February 2006
Abstract
A multivariate index assessing community stress in marine benthic count data is compared to other multivariate methods.
Both ordinations and clustering methods are included in the comparison and so is AMBI (AZTI marine biotic index). The
community disturbance index (CDI) requires the data to contain undisturbed reference samples. If this requirement is met, the
calculations are carried out in two steps. Firstly, the reference samples are separated and used to build a model representing the
natural variation. Subsequently, the remaining samples are compared to the model and the respective disturbance indices are
calculated. The CDI has the same sensitivity as the traditional multivariate methods. Additionally, one obtains a quantification of
the relative level of disturbance. The comparison is performed using data from monitoring surveys at three different oilfields in
the North Sea: Troll, Ekofisk and Oseberg-Brage. The samples are collected in 1996, 1990 and 1996, respectively.
# 2006 Published by Elsevier Ltd.
Keywords: Multivariate analysis; Environmental monitoring; Marine benthic analyses; Community disturbance index (CDI); Pollution
1. Introduction
Marine benthic communities are sensitive to
natural and anthropogenic factors in the local
environment. This is utilised in environmental impact
surveys where benthic fauna is widely used as an
indicator of pollution (e.g. Olsgard and Gray, 1995).
There are several ways to extract information from the
* Corresponding author. Fax: +47 55 58 94 90.
E-mail address: [email protected] (G.R. Flaten).
1470-160X/$ – see front matter # 2006 Published by Elsevier Ltd.
doi:10.1016/j.ecolind.2006.02.001
data collected in such surveys. Various indices may be
used, e.g. Shannon–Wiener (Shannon and Weaver,
1949), Pielou (Pielou, 1966) and Hurlbert (Hurlbert,
1971). All of them assign a single value (index) to each
sample. The indices are calculated from the abundance
of species and their distribution without considering
the species assemblage. Accordingly, community
changes due to differences in the species assemblage
are not detected by these indices.
Multivariate approaches utilise the species identity
in addition to their abundances. Thus, they are more
G.R. Flaten et al. / Ecological Indicators 7 (2007) 254–276 255
sensitive in detecting community changes. Generally,
ordinations are applied to rank the samples according
to their similarity, e.g. principal component analysis
(PCA) (Hotelling, 1933; Wold et al., 1987) and
correspondence analysis (CA) (Hirschfeld, 1935;
Jongmann et al., 1995). Groupings (clustering) of
the samples using a dissimilarity matrix, as in
hierarchical clustering (Everitt, 1980; Clarke and
Warwick, 1994) and in multidimensional scaling
(Kruskal, 1964; Shepard, 1962; Borg and Groenen,
1997), are also widely used. The results from these
established multivariate methods are mainly descrip-
tive. They separate polluted and unpolluted samples,
but it is not obvious how to use these methods to
quantify the level of disturbance.
The simple interpretation provided by the various
indices and the sensitivity of multivariate methods can
be combined by a modelling approach (Massart et al.,
1996). A set of reference samples representing the
natural variation in the community under investigation
is selected from the data set. The selection of reference
samples is crucial since the further analyses relate to
the model calculated from these samples. Usually, the
analyst uses a priori knowledge or preliminary
ordinations to choose reference samples, but there
are also ways of validating the chosen reference set
(Flaten, 2002; Flaten et al., 2004). Based on the
reference samples a multivariate model is built. The
deviation from this model is normalised and used as an
index of stress in the benthic community, called
community disturbance index (CDI) (Massart, 1997).
In the application discussed in this work, the stress is
interpreted as the effect from possible pollution.
A CDI equal to 1.0 defines the boundary of the
model. Samples within the model boundaries have
CDI-values lower than 1.0, while the samples outside
have CDI-values higher than 1.0. All samples are
compared with the model and CDI-values are
calculated. The polluted samples contain a different
species assemblage and distribution compared to the
natural variation described by the model, thus they
locate outside the boundary of the model. The
unpolluted samples locate within the boundary of
the model since their species assemblage is similar to
the reference set samples. Accordingly, the polluted
samples are identified and their level of disturbance
quantified. However, care must be taken as the CDI
does not distinguish between pollution and natural
environmental differences. All conclusions about
pollution should therefore be confirmed, e.g. by
consulting the species abundance tables for the
suspected sites.
CDI is a multivariate index and recently AMBI
(AZTI marine biotic index) another multivariate index
was introduced (Borja et al., 2000). AMBI is based on
species lists where the different species are prelimin-
ary assigned to one of five ecological groups according
to their sensitivity to an increasing pollution gradient.
A continuous series of values from 0 to 6 is obtained
by using a pre-defined formula based upon the
percentages of abundance of each ecological group.
Another type of multivariate index is multi-metric
indices. The fundamental idea is that the various
traditional [univariate] indices can describe different
aspects of the benthic community, and by combining
them an index that gives a better representation of the
ecological status is obtained (e.g. Rosenberg and
Resh, 1993). However, this implies that a multi-metric
index is only multivariate in the sense of using the
species identity in addition to their abundances if one
or more of the included indices capture this
information. Multi-metric indices are not further
discussed in this paper as the present applications
are region specific (Diaz et al., 2004) and none are
proposed for the North Sea. W-statistics (Clarke, 1990)
is an index which is often regarded as multivariate
although it is a statistic used in the comparison of
dominance curves. Hence, since dominance curves
can be thought of as ‘‘an intermediate between
univariate summaries and a full multivariate analysis’’
(Clarke, 1990), W-statistics is not a multivariate index
with respect to the definition used in this work and
accordingly not further discussed.
The focus on assessment methods for ecological
systems has intensified after the publication of the
European Water Framework Directive (2000/60/EC).
This directive challenges the EC members to
categorize their water resources including coastal
waters in five categories ranging from high status
(‘‘undisturbed conditions’’) to bad status (‘‘severe
alterations’’). Benthic invertebrate fauna is one of the
biological quality elements that can be used in the
assessment and obviously metrics that give a correct
and quantitative description of the ecological status
are needed. A thorough discussion of the Water
Framework Directive implications for assessment
G.R. Flaten et al. / Ecological Indicators 7 (2007) 254–276256
methods is outside the scope of this paper although
some considerations are included in Section 5.
In the present work, the performance of the CDI is
compared to traditional multivariate approaches and
AMBI. In the first part of the theory section a brief
introduction to the tested multivariate methods is
given. The last part of the theory section contains a
thorough presentation of the CDI approach. This
includes a presentation of the modelling approach and
a suggestion of how CDI-values can be regarded in a
statistical context, as the CDI approach is expected to
be unfamiliar to the reader. In the results and
discussions sections comparisons are made for three
different data sets collected during environmental
monitoring surveys in the vicinity of oil platforms in
the North Sea. The data sets are chosen due to the
homogenous natural environment at each field and the
single source of disturbance, which creates a gradient
of decreasing disturbance with increasing distance to
the oil platform. These features simplify the validation
of the analysis results, although it is expected that
CDI-values can be used to quantify the environmental
stress or pollution level, in any surveys where samples
from unpolluted sites (i.e. reference set samples) exist.
2. Theory
2.1. Ordination methods
Ordination methods separate the samples due to
their species assemblage and abundance. The ordina-
tions are presented in plots where increasing distance
and angular deviations reflect increasing difference
among samples.
2.1.1. Principal component analysis
Principal component analysis (PCA), (Hotelling,
1933; Wold et al., 1987), is a decomposition of the
abundance matrix, X, into k components:
X ¼ TPt þ E (1)
where T and P are the score and loading matrices,
respectively. Both contain k column vectors. Super-
script t is used to denote transposition. The number of
principal components, k, is chosen according to the
criterion of maximal systematic variance explained.
Often, a k of two or three is sufficient to account for
most of the systematic variance among samples. The
scores and loadings are calculated by demanding
maximal explained variance under the constraint of
mutual orthogonality among the components. The
residual matrix, E, contains the part of X not explained
by the principal components.
2.1.2. Correspondence analysis
Correspondence analysis (CA) (Hirschfeld, 1935;
Jongmann et al., 1995) is a PCA with weighting:
M�1XN�1 ¼ TcaPtca þ E (2)
where both M and N are diagonal matrices. The
diagonal in M contains the row sums of the abun-
dance matrix, X, while N contains the column sums.
Tca and Ptca are the scores and loadings matrices,
respectively. They are labelled ‘ca’ to emphasise that
they are different from the PCA scores and loadings.
Two or three components are often sufficient to
explain most of the variation. As in Eq. (1), E is
the residual matrix.
2.2. Methods based on the dissimilarity matrix
The distances between the sample vectors, called
dissimilarities, can be used for grouping the data.
Bray–Curtis dissimilarities (Bray and Curtis, 1957)
are usually applied when analysing benthic count data,
and they are calculated as:
di j ¼Pn
k¼1 jxik � x jkjPnk¼1ðxik þ x jkÞ
; i; j ¼ 1; . . . ;m (3)
where dij is the dissimilarity between sample i and j, xik
the count of species k in sample i, m the number of
samples in the data set, and n is the number of species.
All the di,j-values are collected in a symmetric dis-
similarity matrix, D.
2.2.1. Multidimensional scaling
Non-metric multidimensional scaling (MDS)
(Kruskal, 1964; Shepard, 1962; Borg and Groenen,
1997) is a low-dimensional representation of the
information in the dissimilarity matrix. An MDS plot
is a mapping of the dissimilarities, dij, into the
corresponding distances, distij(Z), in an MDS space Z
G.R. Flaten et al. / Ecological Indicators 7 (2007) 254–276 257
(Borg and Groenen, 1997, p. 31):
f : di j! disti jðZÞ (4)
In order to assess the quality of the MDS models
the stress value proposed by Kruskal (1964) is used:
Stress ¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPðdi j � disti jðZÞÞ2P
d2i j
s(5)
Stress values measure the correspondence between the
mapping into the MDS space, distij(Z), and the dis-
similarities, dij. The lower the stress value, the better
the representation of the dissimilarities in the MDS
space, i.e. the corresponding MDS plot gives a better
picture of groupings and clusters in the data as cap-
tured by the dissimilarities. A rule-of-thumb is that
stress values above 0.2 indicate poor representation of
the data. Nevertheless, this is only suggestive since
both the numbers of samples and the properties of the
data influence the stress value.
2.2.2. Agglomerate hierarchical clustering
In agglomerate hierarchical clustering (Everitt,
1980; Johnson and Wichern, 1992) the two samples
having the smallest dissimilarity are identified and
joined. The dissimilarities containing the two com-
bined samples are removed from the dissimilarity
matrix. Subsequently, dissimilarities between the new
cluster of samples and the remaining samples are
calculated and inserted into the dissimilarity matrix.
This is repeated until all the samples are connected.
The sequence of combinations shows the groupings in
the data. This is visualised in a dendrogram.
Group average linking is usually applied to
calculate the new dissimilarities in benthic analyses.
It defines the dissimilarities of the new group as the
mean of the dissimilarities in the preceding groups,
and can be formulated as (Johnson and Wichern,
1992):
dðUVÞW ¼PNðUVÞ
i
PNW
k dik
NðUVÞNW
(6)
where d(UV)W is the new dissimilarity when combining
cluster (group) (UV) with cluster (sample) W. dik is the
distance between object i in UV and object j in W.
N(UV) and NW is the number of items in clusters (UV)
and W.
2.3. Modelling approach
The modelling approach consists of two steps: a
model describing the natural variation in the survey
area is built. Subsequently, the samples are compared
to this model and the similarity to the model is
quantified by the CDI-values. The samples outside the
boundaries of the model are classified as not belonging
to the model. They have a different species
assemblage, which is either due to pollution or
deviations in the natural environmental factors. The
boundaries of the model are determined by the F-
statistics, as shown later in this section (Eq. (10)).
The least complicated model is one single point. The
distance to this point determines whether a sample
belongs to the model. Confidence intervals are point
models. The upper and lower confidence limits
correspond to the boundaries of the model, while the
centre point represents the model. In Fig. 1a the (1 � a)
confidence interval for the standardised normal dis-
tribution is shown. The confidence interval is the part of
the probability density function labelled (1 � a). The a
samples within the shaded tails are outside the con-
fidence interval and hence outside the model. Observe
that the distance between a sample and the centre point
determinewhether that samples lies outside or inside the
confidence interval, i.e. outside or inside the model.
The point model can be extended to a line model,
and such a model is shown in Fig. 1b. Asterisks
indicate the model itself. The boundary for a line
model is a surrounding cylinder and in the figure the
solid lines indicate this boundary. Two hundred
simulated samples are compared with the model in
Fig. 1b. Each of the samples is indicated by a point.
The samples outside the boundary of the model are
emphasised by bold face. Obviously samples outside
the model can also be located in front of or behind the
cylinder which corresponds to the model boundary.
The histogram for the distances between the samples
and the line model in Fig. 1b, are shown in Fig. 1c. The
white bars represent the samples within the model
boundaries, while the shaded bars represent the samples
outside the boundaries, i.e. the samples indicated by
bold points in Fig. 1b. By definition, the distance
between a sample and the line model is always positive,
and therefore the histogram only contains positive
numbers. Observe that there is a clear similarity
between the right half of the probability density
G.R. Flaten et al. / Ecological Indicators 7 (2007) 254–276258
Fig. 1. The figure outlines the idea of modelling by comparing modelling with the well-known confidence interval. (a) The probability function
for the standardised normal distribution is shown. The main part, labelled (1 � a), is the part of the distribution within the 100(1 � a)%
confidence interval. The confidence interval can be seen as a point model for the centre point, here zero. (b) A line model (asterisks) for a set of
measurements (points). The boundary of the model (solid cylinder) is found by statistical testing on the residuals for the samples. The distance to
the model determines whether a sample is inside or outside the model. It is a clear analogy to the confidence interval or point model shown in (a).
(c) The distribution of the distances to the model for the samples in (b). The vertical line indicates the boundary of the model, and the bars
representing samples outside the model are shaded. Observe the resemblance with the right half of the probabilities function in (a).
G.R. Flaten et al. / Ecological Indicators 7 (2007) 254–276 259
function in Fig. 1a and the histogram of the distances
to the model in Fig. 1c. The model is easily extended to
more dimensions.
2.3.1. Community disturbance index
The calculation of CDI-values is based on a
modelling approach called soft independent modelling
of class analogies (SIMCA) (Wold, 1976). In short, it is
a supervised classification where the models are built
using a lower dimensional representation (PCA) of a
reference set selected a priori. The reference set consists
of the sampling sites representing the natural variation,
i.e. the undisturbed sites. The a priori knowledge used to
select this reference set is generally the results from the
ordinations combined with the lists of the most
abundant species. The ordinations are used to separate
the reference samples, and the abundance lists are used
to identify them. The distance to the oil rig is an
additional parameter, which can be used to select
reference samples in the application presented in this
work as it is reasonable to expect that the pollution
impact decreases with increasing distance to the
pollution source (the oil rig). There are ways for
validating reference sets (Flaten, 2002), but these
approaches are not further discussed in this work.
A PCA model, describing the natural variation, is
built from the reference set (Eq. (1)).1 The number of
components to include in the model, k, is decided by
cross validation (Wold, 1978). In short, cross
validation is to successively remove groups of samples
from the reference set, building models of the
remaining samples with various numbers of compo-
nents and subsequently predicting the left out samples.
The number of components that gives the smallest sum
of squared prediction errors is used in the model of the
reference set. After the model has been found, new
samples are compared to it using:
ti ¼ xiPrefðPtrefPrefÞ�1
(7)
and
ei ¼ xi � tiPref (8)
1 Even though it is widely assumed that the PCA requires normal-
ity in the data, the development of principal components does not
require a multivariate normal assumption (Johnson and Wichern,
1992). The benthic count data form sparse integer matrices, but for
modelling purposes PCA seems to handle this kind of data.
where xi is the sample to be compared to the model,
Pref the loading matrix for the reference set, and ti is
the vector containing the estimated scores for sam-
ple i. The residual, ei, contains the variation in xi not
accounted for by the model. The line model example
in Fig. 1c can be regarded as a SIMCA model
with one component, and in that case the residual
corresponds to the distance from the sample to the
model.
To assess the samples within or outside the
boundary of the model, the residual standard deviation
(RSD) is used. Firstly, the RSD-value for each sample
i belonging to the reference set is calculated as:
RSDi ¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðmref � 1Þðmref � k � 1Þ
etiei
ðn� kÞ
s(9)
where ei is the residual for sample i with respect to the k-
dimensional model, mref the number of samples in the
reference set, and n is the overall number of species.
After calculating the RSD for the reference
samples, the boundary of the model, RSDcrit, is
defined as the product of the mean residual standard
deviation, RSDref, and the critical F-value:
RSD2crit ¼ RSD2
ref Fcrit (10)
where Fcrit is from the standard F-distribution using the
selected level of significance, here F0:95½ðn� kÞ; ðn�kÞðmref � k � 1Þ�where n, mref and k are the number of
species, the number of samples in the reference set, and
the number of components in the model, respectively.
The disturbance of each sample (CDI) is quantified
by relating the sample’s RSD to the boundary of the
model, RSDcrit. The RSD value for non-reference
sample i is calculated as:
RSDi ¼ffiffiffiffiffiffiffiffiffiffiffiet
iei
n� k
r(11)
where, as before, ei is the k residuals for sample i, k the
number of components in the model, and n is the
number of species. The CDI-value for sample i is
defined as:
CDIi ¼RSDi
RSDcrit
(12)
where RSDcrit is the border of the model as defined
by Eq. (10). Obviously, samples within the model
G.R. Flaten et al. / Ecological Indicators 7 (2007) 254–276260
2 It is tested whether the variance for the sample is larger than the
variance for the reference samples. This is a complete test since the
variance is related to the model of the natural variation rather than
the mean values for the reference set. The mean values would
require a two-sided test in order to be unambiguous.
boundaries have CDI-values smaller than 1.0. How to
interpret different CDI-values is further discussed in
the next section.
The CDI-values correspond to the deviation from
the model for the respective samples. Thus, the CDI-
values are a quantification of the stress or deviation in
the benthic community of the samples compared to the
communities described by the model. In this work, the
reference samples are selected in order to obtain a
reference set that describes the natural variation in a
limited area. Accordingly, CDI-values higher than one
corresponds to samples that deviate from the natural
variation, and the size of the CDI reports to which
extent the sample deviate from the model. High CDI-
values can therefore be interpreted as highly polluted
samples. However, it should be emphasised that the
CDI-values incorporate all variation separating sam-
ples from the model including natural variation.
Hence, any conclusions about pollution need to be
confirmed by, e.g., consulting the species table for the
suspect sample.
2.3.2. Statistical interpretation of the community
disturbance index
The community disturbance index can also be
explained with basis in the F-test. Eqs. (10) and (12)
can be combined to obtain:
CDIi ¼RSDi
RSDcrit
ffiffiffiffiffiffiffiffiFcrit
p (13)
where RSDi is the distance to the model for sample i,
RSDcrit the mean residual variation for the samples in
the reference set, and Fcrit is the F-statistics at the
selected confidence level and corrected for degrees of
freedom.
The similarity to the F-test is more apparent if
Eq. (13) is rearranged to:
Fcrit ¼RSD2
i
RSD2ref
1
CDI2(14)
Generally, F-tests compare the variances of two nor-
mal populations, and the null hypothesis is that the
variances are equal, i.e.:
H0 :s2
1
s22
¼ 1 (15)
where s21 and s2
2 are the variances for the two com-
pared distributions. Population 1 corresponds to the
set of samples to be tested as possibly polluted and
population 2 corresponds to the reference set describ-
ing the natural variation. In F-tests for real systems
where the true variances are unknown, estimates of the
variances are used. In the CDI calculations the RSD
values are used as estimates for the variance.
The null hypothesis is that the tested sample i has
similar variation as the reference set, i.e. is not
polluted. Rejection of the null hypothesis indicates
that the composition of the tested sample is different
from the natural variation as described by the variation
of the reference samples, at the selected level of
significance. Hence, the sample can be regarded as
polluted although this assumption must be confirmed
by e.g. consulting the species abundance tables.
The hypothesis test can be formulated as:2
Reject H0 :s2
i
s2ref
¼ 1 in favor of
H1 :s2
i
s2ref
> 1 if
RSD2i
RSD2ref
�Fa½ðn� kÞ; ðn� kÞðmref � k � 1Þ� (16)
where a, n, mref and k are the significance level,
number of species, the number of samples in the
reference set, and the number of components in the
model, respectively.
Combining Eqs. (14) and (16) one realises that
CDI-values (strictly CDI2) larger than 1.0 is a rejection
of the null hypothesis at the selected level of
significance a. Thus, the tested sample probably
belongs to a population different from the modelled
natural variation.
By studying the cumulative probability density
functions, it can be shown that different CDI-value
levels can be interpreted as F-tests at various
significance levels given that the degrees of freedom
are kept constant (Flaten, 2002). Hence, higher
community disturbance indices, which are equal to
G.R. Flaten et al. / Ecological Indicators 7 (2007) 254–276 261
larger RSDi values, correspond to more significant
rejection of the null hypothesis. As in all statistical
testing a certain significance level a implies that 1 � a
normal samples can be wrongly classified. Hence
samples just outside the model (CDI-values slightly
higher than 1.0) can be sensitive to the selection of
significance level. In our experience this has not been a
problem as the main features of the CDI-values are
robust with regard to significance level. Another point
to be made here is that the CDI calculations as all
statistical calculations depend on the degrees of
freedom. High CDI-values, say CDI > 2, in data sets
with the dimensions as those discussed in this paper
indicates a very significant difference between the
current sample and the model. In a smaller data set the
same CDI-value is a less significant indication of
difference than in the bigger data set, cf. Eq. (16).
2.4. AZTI marine biotic index
AMBI is a novel index (Borja et al., 2000) based on
existing ecological models. The benthic species are
assigned to one of five ecological groups, GI–GV,
according to their sensitivity to pollution. Continuous
series of values from 0 to 6 is obtained by using a pre-
defined formula based upon the percentages of
abundance of each ecological group:
AMBI ¼
fð0�%GIÞ þ ð1:5�%GIIÞ þ ð3�%GIIIÞþ ð4:5�%GIVÞ þ ð6�%GVÞg
100(17)
AMBI can be calculated using the free software
provided by AZTI (www.azti.es), and this software
also contains the most recent grouped species list. The
AMBI results presented in this work is calculated
using the species list labelled ‘‘October 2005’’.
2.5. Pre-treatment
Square root transformations are used to decrease
the influence of the most abundant species in the PCA
ordinations and in the CDI modelling. The square root
transformation also reduces heteroscedasticity (Kval-
heim et al., 1994), which is important in the CDI
modelling. Square root transformation was not used
for the CA ordinations because the ordination patterns
for the square root transformed data were similar to the
ordinations using the raw data. Threshold values i.e.
removal of rare species, was not used as this did not
change the analysis results.
3. Data
The data investigated in this work originates from
the environmental monitoring programme of petro-
leum activities on the Norwegian shelf (SFT, 1997).
Data from three monitoring surveys performed in the
vicinity of three oil platforms are selected. The
samples are collected along four approximately
orthogonal bearings. One of the bearings lies along
the main current in the investigated areas. At each
sampling site five replicate samples are collected. The
exception is the reference site, where 10 replicates are
collected. The reference site is located far from the oil
rig, typically 10 000 m and beyond. The sampling
sites are in this work named stations. All analyses are
performed at a replicate level which makes it possible
to identify deviating replicates. Additionally, both
inter and intra differences among the stations are
visualised when the data are analysed at a replicate
level. Obviously replicate level results can be difficult
to interpret if say two of the five replicates are polluted
but in this work the focus is on the comparison of
methods. Usually the dilemma of deviating replicate
results are solved by reporting average values, and
mean CDI-values are just as representative as other
averaged statistics.
The three data sets are collected at Oseberg-Brage
(1996), Troll (1996) and Ekofisk (1990). All samples
are handled according to the prescriptions given by the
Norwegian Pollution Control Authorities (SFT, 1997).
The fields are located in the North Sea, between Great
Britain and Norway. The differences in the physical
conditions of the fields are presented in Table 1.
3.1. Data labelling
When referring to the stations, the notation AA-
ppp/qqq is used. AA is a code used to designate the
field. TA is used for Troll and EK for Ekofisk.
Oseberg-Brage is a cluster of three installations
indicated by F, C and BR, respectively. ppp is the
orientation of the axis that the station lies along. This
direction is given as the degrees relative to the north.
qqq gives the distance in metres between the platform
G.R. Flaten et al. / Ecological Indicators 7 (2007) 254–276262
Table 1
The three investigated fields and their physical properties are listed
Field Depths (m) Sediment type Main current Oil production start
Troll 304–311 Fine grained clay and silt From north-west 1996 (after collection
of samples)
Ekofisk 67–76 Medium sand Uncertain, trend north and east 1971
Oseberg-Brage 108–142 Medium/fine grained sand From north-west 1991
and the station. In the figures containing replicates all
the five replicates are labelled with the number of the
station. The numbering of the stations at Troll, Ekofisk
and Oseberg-Brage are shown in Tables 2, 3 and 4,
respectively. In those tables the Shannon–Wiener
Table 2
The numbering of the stations from Troll together with their distance to the
listed
Numbering of
stations
Troll CDI AMBI Shannon–
Wiener
index
Total
number
of species
Tota
num
of in
1 TA-45/500 0.96 1.64 4.12 61 1390
2 TA-120/2000 0.89 1.63 4.38 65 1340
3 TA-140/800 1.00 1.54 4.25 59 1270
4 TA-140/1200 0.97 1.75 4.43 68 1900
5 TA-220/250 0.88 1.45 4.12 55 1260
6 TA-220/500 0.85 1.60 4.16 51 1090
7 TA-220/1000 0.85 1.49 3.84 48 1165
8 TA-270/250 1.01 1.58 4.33 69 1705
9 TA-270/500 0.86 1.37 3.98 59 1390
10 TA-270/1000 0.92 1.51 4.26 64 1630
11 TA-360/250 0.94 1.55 4.29 68 1375
12 TA-360/500 0.94 1.72 4.27 67 1555
13 TA-360/1000 0.95 1.38 4.44 75 1650
14 TA-360/2000 0.91 1.43 4.40 60 1320
15 TA-180/10000 0.88 1.52 4.28 95 2665
For each station the Shannon–Wiener diversity index is calculated in add
species, the total number of individuals, and the three most abundant spe
abundance of the species.
index, the total number of species and the three most
abundant species at each station are included too. The
Shannon–Wiener index is chosen as the univariate
reference index since it is widely used in the analysis
of the type of data discussed in this paper. However, it
contamination source and the direction with regard to the north are
l
ber
dividuals
Most abundant species (abundances
in parenthesis)
Thyasira ferruginea (245), Onuphis quadricuspis (90),
Clymenura borealis (90)
Thyasira ferruginea (160), Clymenura borealis (120),
Onuphis quadricuspis (110)
Onuphis quadricuspis (125), Thyasira ferruginea (110),
Clymenura borealis (100)
Thyasira ferruginea (265), Onuphis quadricuspis (205),
Clymenura borealis (130)
Thyasira ferruginea (200), Onuphis quadricuspis (145),
Chaetozone setosa (95)
Onuphis quadricuspis (120), Thyasira ferruginea (105),
Clymenura borealis (85)
Onuphis quadricuspis (205), Thyasira ferruginea (120),
Clymenura borealis (90)
Onuphis quadricuspis (250), Thyasira ferruginea (180),
Clymenura borealis (155)
Onuphis quadricuspis (220), Thyasira ferruginea (220),
Clymenura borealis (105)
Onuphis quadricuspis (260), Clymenura borealis (165),
Thyasira ferruginea (155)
Thyasira ferruginea (195), Onuphis quadricuspis (175),
Clymenura borealis (110)
Thyasira ferruginea (230), Clymenura borealis (145),
Onuphis quadricuspis (130)
Onuphis quadricuspis (165), Thyasira ferruginea (150),
Paramphinome jeffreysii (120)
Onuphis quadricuspis (185), Thyasira ferruginea (170),
Clymenura borealis (95)
Thyasira ferruginea (405), Onuphis quadricuspis (225),
Clymenura borealis (200)
ition to CDI and AMBI. Additionally the total number of observed
cies are given for each station. The numbers in parentheses are the
G.R. Flaten et al. / Ecological Indicators 7 (2007) 254–276 263
Table 3
The numbering of the stations from Ekofisk together with their distance to the contamination source and the direction with regard to the North are
listed
Numbering
of stations
Ekofisk CDI AMBI Shannon–
Wiener
index
Total
number of
species
Total
number of
individuals
Most abundant species (abundances in parenthesis)
1 EK-1/5700 0.86 1.46 3.94 54 320 Anthozoa indet. (61), Goniada maculata (36),
Sthenelais limicola (28)
2 EK-3/3300 0.92 1.46 4.13 49 305 Goniada maculata (44), Amphiura filiformis (28),
Sthenelais limicola (19)
3 EK-335/1800 1.78 3.55 2.10 21 270 Nemertini indet. (110), Capitella capitata (76),
Glycera alba (20)
4 EK-7/750 0.95 1.59 3.88 43 261 Goniada maculata (61), Amphiura filiformis (22),
Sthenelais limicola (20)
5 EK-74/6500 0.82 1.48 3.94 54 251 Goniada maculata (47), Amphiura filiformis (38),
Sthenelais limicola (20)
6 EK-74/3900 0.94 1.61 4.12 57 271 Goniada maculata (43), Sthenelais limicola (29),
Amphiura filiformis (22)
7 EK-84/1800 1.12 1.39 4.09 62 353 Goniada maculata (56), Amphiura filiformis (43),
Phoronis sp. (26)
8 EK-94/800 1.07 1.34 4.04 56 297 Goniada maculata (63), Amphiura filiformis (19),
Scoloplos armiger (18)
9 EK-137/5800 0.93 1.46 3.90 53 281 Amphiura filiformis (50), Goniada maculata (41),
Phoronis sp. (22)
10 EK-144/4400 0.91 1.47 3.85 53 289 Goniada maculata (53), Ophiura affinis (32),
Amphiura filiformis (30)
11 EK-148/2500 1.09 1.38 4.17 57 360 Amphiura filiformis (48), Goniada maculata (44),
Phoronis sp. (40)
12 EK-146/1300 1.20 1.37 4.38 66 433 Amphiura filiformis (46), Goniada maculata (44),
Phoronis sp. (41)
13 EK-140/850 1.17 1.40 4.52 72 358 Goniada maculata (45), Phoronis sp. (36),
Amphiura filiformis (29)
14 EK-175/500 1.94 1.01 3.55 58 817 Galathowenia oculata (413), Goniada maculata (62),
Ampharete falcata (43)
15 EK-217/6500 0.85 1.46 3.86 57 270 Goniada maculata (49), Amphiura filiformis (49),
Sthenelais limicola (20)
16 EK-217/4000 0.90 1.70 3.78 54 291 Goniada maculata (51), Amphiura filiformis (39),
Sthenelais limicola (24)
17 EK-208/2500 0.93 1.51 4.14 53 347 Amphiura filiformis (54), Goniada maculata (48),
Sthenelais limicola (25)
18 EK-200/1200 1.27 1.40 4.23 62 461 Galathowenia oculata (59), Goniada maculata (58),
Amphiura filiformis (42)
19 EK-291/5700 0.92 1.61 3.99 58 293 Goniada maculata (53), Amphiura filiformis (34),
Sthenelais limicola (24)
20 EK-287/4000 0.93 1.61 4.05 53 284 Goniada maculata (51), Amphiura filiformis (30),
Sthenelais limicola (22)
21 EK-288/1900 1.01 1.53 4.22 49 336 Phoronis sp. (45), Goniada maculata (42),
Sthenelais limicola (19)
22 EK-290/1000 1.12 1.59 4.28 55 337 Goniada maculata (50), Amphiura filiformis (35),
Phoronis sp. (21)
23 EK-337/450 3.51 0.27 1.20 40 1614 Galathowenia oculata (1373), Goniada maculata (71),
Levinsenia gracilis (20)
24 EK-90/30000 1.13 1.31 3.19 64 456 Amphiura filiformis (137), Goniada maculata (46),
Scoloplos armiger (40)
For each station the Shannon–Wiener diversity index is calculated in addition to CDI and AMBI. Additionally the total number of observedspecies, the total number of individuals, and the three most abundant species are given for each station. The numbers in parentheses are theabundance of the species.
G.R. Flaten et al. / Ecological Indicators 7 (2007) 254–276264
Table 4
The numbering of the stations from Oseberg-Brage together with their distance to the contamination source and the direction with regard to the
North are listed
Numbering
of stations
Oseberg-
Brage
CDI AMBI Shannon–
Wiener
index
Total
number of
species
Total
number of
individuals
Most abundant species (abundances in parenthesis)
1 F-135/350 2.12 3.99 2.01 44 751 Chaetozone setosa (508), Thyasira flexuosa (77),
Cerianthus lloydii (23)
2 F-135/500 1.42 3.10 3.16 45 390 Chaetozone setosa (109), Thyasira flexuosa (76),
Spiophanes bombyx (75)
3 F-135/750 0.82 2.26 2.72 41 356 Owenia fusiformis (150), Chaetozone setosa (60),
Spiophanes bombyx (48)
4 F-135/1500 0.83 1.69 1.54 50 1017 Owenia fusiformis (809), Myriochele oculata (58),
Spiophanes bombyx (18)
5 F135/10000 1.03 1.81 2.51 73 1054 Owenia fusiformis (636), Myriochele oculata (108),
Amphiura filiformis (66)
6 C-150/1000 2.37 4.01 1.75 42 983 Chaetozone setosa (730), Thyasira flexuosa (56),
Spiophanes bombyx (35)
7 C-150/1500 2.47 2.01 2.80 54 1190 Ditrupa arietina (394), Chaetozone setosa (279),
Owenia fusiformis (195)
8 C-180/250 3.92 4.67 2.31 34 2075 Chaetozone setosa (915), Capitella capitata (675),
Thyasira sarsii (132)
9 C-335/9900 0.94 1.75 2.30 64 550 Owenia fusiformis (356), Myriochele oculata (44),
Spiophanes bombyx (21)
10 BR-330/15000 0.84 1.72 2.74 51 343 Owenia fusiformis (187), Myriochele oculata (35),
Scoloplos armiger (11)
11 BR-150/250 4.93 4.98 1.08 17 2786 Capitella capitata (1833), Thyasira sarsii (892),
Eteone longa (26)
12 BR-150/500 1.46 3.80 2.63 38 306 Chaetozone setosa (133), Capitella capitata (37),
Thyasira flexuosa (28)
13 BR-150/1000 2.39 2.78 3.03 46 884 Chaetozone setosa (274), Thyasira flexuosa (243),
Ditrupa arietina (93)
14 BR-150/1500 1.87 2.21 3.06 45 676 Chaetozone setosa (145)Thyasira flexuosa (135),
Owenia fusiformis (106)
For each station the Shannon–Wiener diversity index is calculated in addition to CDI and AMBI. Additionally the total number of observed
species, the total number of individuals, and the three most abundant species are given for each station. The numbers in parentheses are the
abundance of the species.
has been pointed out that the Shannon–Wiener index
can be more insensitive than other diversity measures
(Gray, 2000).
Table 5
The common species observed in the three data sets
Troll Ekofisk Oseberg-Brage
Troll 227 27 62
Ekofisk 27 197 68
Oseberg-Brage 62 68 252
The diagonal contains the number of species in each field.
4. Results
4.1. Local versus global analysis
The count data from all the three fields were
merged into one matrix. The species not present in a
sample were represented by zero. In Table 5, the
number of common species at the three fields is listed
together with the total number of species. Fig. 2 shows
the MDS plot of the Bray–Curtis dissimilarities for the
merged matrix. Samples from Troll, Ekofisk and
Oseberg-Brage are labelled ‘T’, ‘E’ and ‘O’,
respectively. The separation of the fields makes it
evident that the further analyses must be performed
field-wise, i.e. locally.
G.R. Flaten et al. / Ecological Indicators 7 (2007) 254–276 265
Fig. 2. MDS ordination of Bray–Curtis dissimilarities for all the
three fields. The samples from Troll, Ekofisk and Oseberg-Brage are
labelled ‘T’, ‘E’ and ‘O’, respectively. Observe that the three fields
clearly are separated.
4.2. Troll
The ordination methods indicate that there is no
clear gradient in the Troll data. The CA and PCA
ordination plots are shown in Fig. 3a and 3b,
respectively. Both plots contain the two first ordination
components. The PCA ordination is performed on
square root and centred data. One replicate from
station 4 stands out in the CA ordination. Capitella
capitata (Fabricius 1750) was abundant in this sample
(110 individuals). The remaining samples form one
single cluster. The deviating replicate is located in the
lower part of the PCA ordination plot but the
separation is less here.
The dendrogram resulting from the Bray–Curtis
dissimilarities is shown in Fig. 3c. Average species
abundances are used and the nodes are labelled with
the corresponding station references. The stations TA-
220/500, TA-220/1000 and TA-140/800 are grouped
together on the top of the plot. However, the
dissimilarity is low (less than 0.35) so it is concluded
that there are no distinct clusters in the Troll data. The
MDS gives the same picture: no clusters are observed
(Fig. 3d). There is no gradient in the MDS plot. The
high stress value (0.26) suggests that the representa-
tion of the data is not very good. The reason for this is
probably that there are no groupings in the data.
The outermost stations, stations 2, 4, 7, 9, 12, 13, 14
and 15, are included in the reference set except the
outlying replicate from station 4 which is excluded.
The modelling calculations are performed on square
root transformed data. Cross validation suggests using
only one component in the model. The CDI-values
obtained from the modelling are shown in Fig. 3e. A
bar marks the CDI-value for each replicate and the
bars for the reference samples are shaded. The
sampling sites are separated by horizontal dotted
lines and each station is labelled by the location
details. Some replicates have CDI-values higher than
1.0 and should accordingly be classified as disturbed.
Still, mean CDI-values are all equal to or lower than
1.0 for all the stations (Table 2). The variation in CDI-
values for replicates is due to differences in their
species assemblage, and this variation also appears in
the other multivariate methods. The conclusion is that
there was no pollution at the Troll field.
AMBI supports these findings to some extent as the
levels are low, 1.4–1.7. Although using the biotic
index scale which is calibrated with the European
Water Framework Directive (Borja et al., 2004) all
stations are classified as ‘‘slightly disturbed’’
(1.2 < AMBI � 3.3). The ‘‘slightly disturbed’’ cate-
gory is characterised by a domination of species
tolerant to an excess of organic matter. This is
unexpected since the oil production at the field did not
start until after the studied data set was collected
(Table 1). However, a possible explanation is that there
is a fairly high percentage of species not assigned to
any particular ecological group in the species list
(average is 17.5%) and this calls for careful
interpretation of the pollution level (Borja and
Muxika, 2005).
4.3. Ekofisk
The multivariate analyses showed that the refer-
ence station located 30 000 m from the oil rig (station
24), was different from the others. This is due to the
geographic distance to the other stations rather than to
pollution. The distance itself do not cause the
difference in the community structure but environ-
mental factors like, e.g., depth, current, and sediment
type, are more likely to be different at large distances.
Therefore, station 24 is considered to be an outlier and
excluded from the further analyses.
G.R. Flaten et al. / Ecological Indicators 7 (2007) 254–276266
Fig. 3. The results obtained through the different multivariate analyses of the Troll data. (a) The CA ordination plot for the Troll data. The
numbers refer to the sampling site where the replicate is collected. One replicate from station 4 separates, but the rest of the samples locate in one
cluster. (b) The PCA ordination plot for the square root transformed Troll data. The numbers refer to the sampling site where the replicate is
collected. No particular replicate stands out, and there are no observable trends or groupings in the Troll data. (c) The dendrogram of the Bray–
Curtis dissimilarities calculated from the Troll data. No cluster separates, i.e. the Troll data are homogeneous. (d) MDS ordination of the Bray–
Curtis dissimilarities calculated from the Troll data. The numbers refer to the sampling site where the replicate is collected. Clearly, there are no
groupings in the data. (e) The CDI-values calculated for the Troll data. Shaded bars mark the reference stations. The vertical line indicates CDI-
values equal to 1.0, i.e. the border that separates disturbed and undisturbed stations. The horizontal dashed lines separate the replicates from the
different stations. The distance to the pollution source and the direction with regard to the north is included for each sampling site.
G.R. Flaten et al. / Ecological Indicators 7 (2007) 254–276268
The CA ordination for the Ekofisk data is shown in
Fig. 4a. Station 23 and two replicates from station 14
separate from the others. In the lower part of the plot
one replicate from station 3 has an increased distance
to the main cluster. In the CA ordination without these
eight replicates station 3 separates from the main
cluster.
The PCA ordination for the square root trans-
formed data is shown in Fig. 4b. As in the CA
ordination station 23 and two replicates from station
14 separate from the main cluster. Additionally,
station 3 is separated from the main cluster along the
second component. Removing these samples gives an
ordination with one group of samples. The separation
of station 23 and the two replicates from station 14 is
due to high abundance of the species Galathowenia
oculata (Zachs 1923). Station 3 is characterised by an
elevated abundance of Nemertini indet. and Capitella
capitata (Fabricius 1750).
In the dendrogram of the Bray–Curtis dissimila-
rities (Fig. 4c) the three stations EK-335/1800, EK-
337/450 and EK-175/500 separates from the main
cluster of samples. These are the same three samples
that was identified as different in the ordination plots
(Figs. 4a and b), i.e. samples 2, 23 and 14, respectively.
The MDS ordination is shown in Fig. 4d. Most of
the stations are grouped in one cluster while stations 3,
23 and two replicates from station 14 fall into distinct
groups. Removing them gives an MDS plot where all
the samples are collected in one group. Again, this
corresponds with the observations from the ordination
plots (Fig. 4a and b).
Ten stations are included in the reference set used
in the calculation of the CDI-values. These are stations
1, 2, 5, 6, 9, 10, 15, 16, 19 and 20. Together, they
represent all the bearings in the sampling grid.
Cross validation suggests four components in the
model of the square root transformed data. The
calculated CDI-values (Fig. 4e) indicate that the
Ekofisk field is overall moderately disturbed, i.e. the
CDI-values are generally a little higher than 1.0 for the
stations not belonging to the reference set. The bearing
along the main current (3308) is an exception to the
generally low CDI-values. Along this bearing the
station nearest the installation (station 23) has elevated
CDI-values, close to 4.0. Also the next station in this
direction, station 3, has high CDI-values. Addition-
ally, two of the replicates collected at the station
closest to the installation (station 14) clearly differ
from the model. All these stations with considerably
elevated CDI-values can be regarded as polluted.
However, the CDI measurement reports the difference
between the test sample and the natural variation as
described by the model so any hypothesis about
pollution needs to be confirmed by additional data, e.g.
the species abundance tables. The high CDI-values for
station 23 are due to an elevated abundance (1373
individuals) of Galathowenia oculata (Zachs 1923)
(Table 3). This strengthens the pollution hypothesis as
Galathowenia oculata is known to have opportunistic
properties. Also in the case of station 14 elevated
levels of Galathowenia oculata are the reason for the
high CDI levels, but as only two replicates are affected
any conclusion about pollution is more uncertain. It is
worth noticing that the samples identified as polluted
according to the CDI approach is identical to the
samples detected as different by the ordination
methods.
The AMBI classify the majority of the stations as
slightly disturbed (1.2 < AMBI � 3.3) (Table 4).
The exceptions are as for the other methods, stations
3, 14 and 23. However, stations 14 and 23 are judged
to be undisturbed (0 � AMBI � 1.2) while station
3 is ‘‘moderately disturbed’’ (3.3 < AMBI � 4.3).
This is not corresponding with the other methods or
preliminary expectations as the oil rig is the point
source for pollution and the least polluted stations
should therefore not be those nearest the rig. For
stations 14 and 23 there are only 6.2 and 1.3%
species not assigned to an ecological group, but for
the other stations this percentage is higher. The
maximum is for station 3 where 44.8% of the species
are not assigned while the average is 16.4%.
Therefore, one should be careful with interpreting
the AMBI levels as pollution levels (Borja and
Muxika, 2005).
4.4. Oseberg-Brage
The ordination plots for the Oseberg-Brage data
clearly shows that gradients are present. In the CA
ordination (Fig. 5a) both stations 11 and 8 differ from
the main cluster, and in the lower part of the plot
stations 4, 5, 9 and 10 are partly separated from the
main group of stations. According to the lists of
the most abundant species (Table 4) stations 11 and
G.R. Flaten et al. / Ecological Indicators 7 (2007) 254–276 269
Fig. 4. The results obtained through the different multivariate analyses of the Ekofisk data. (a) The CA ordination plot for the Ekofisk data. The
numbers refer to the sampling site where the replicate is collected The replicates from station 23 and two replicates from station 14 separate, and
the analyses suggest that they are polluted. The rest of the samples assemble in one group. (b) The PCA ordination plot for the square root
transformed Ekofisk data. The numbers refer to the sampling site where the replicate is collected. As observed in the CA ordination the replicates
from station 23 and two replicates from station 14 separate. Additionally, station 3 separates along the second component. Subsequent analyses
show that station 3 is polluted. (c) The dendrogram of the Bray–Curtis dissimilarities calculated from the Ekofisk data. (d) MDS ordination of the
Bray–Curtis dissimilarities calculated from the Ekofisk data. The numbers refer to the sampling site where the replicate is collected. Exactly as in
the dendrogram stations 3, 23 and some replicates from station 14 separate. (e) The CDI-values calculated for the Ekofisk data. Shaded bars mark
the reference stations. The vertical line indicates CDI-values equal to 1.0, i.e. the border that separates disturbed and undisturbed stations. The
horizontal dashed lines separate the replicates from the different stations. The distance to the contamination source and the direction with regard
to the North is included for each sampling site.
G.R. Flaten et al. / Ecological Indicators 7 (2007) 254–276 271
Fig. 5. The results obtained through the different multivariate analyses of the Oseberg-Brage data. (a) The CA ordination plot for the Oseberg-
Brage data. The numbers refer to the sampling site where the replicate is collected. The replicates from stations 11 and 8 clearly separate, and the
further analyses prove that these stations are heavily polluted. (b) The PCA ordination plot for the square root transformed Oseberg-Brage data.
The numbers refer to the sampling site where the replicate is collected. As observed in the CA ordination the replicates from stations 11 and 8
clearly separate. (c) The dendrogram of the Bray–Curtis dissimilarities calculated from the Oseberg-Brage data. The three installations on the
field are indicated by F, C and BR. (d) MDS ordination of the Bray–Curtis dissimilarities calculated from the Oseberg-Brage data. The disturbed
stations are found in the upper part of the plot while the unpolluted samples (stations 3, 4, 5, 9, 10) are found in the lower part. (e) The CDI-values
calculated for the Oseberg-Brage data. Shaded bars mark the reference stations. The vertical line indicates CDI-values equal to 1.0, i.e. the border
that separates disturbed and undisturbed stations. The horizontal dashed lines separate the replicates from the different stations. The distance to
the contamination source and the direction with regard to the North is included for each sampling site. The CDI-values decrease along increasing
distance to the contamination source.
G.R. Flaten et al. / Ecological Indicators 7 (2007) 254–276 273
8 have elevated abundances of two species each
(Capitella capitata (Fabricius 1750) and Thyasira
sarsii (Philippi 1845) or Chaetozone setosa (Mal-
mgren 1867), respectively) while stations 4, 5, 9 and
10 have elevated abundances of one species (Owenia
fusiformis (Delle Chiaje 1842)). This pattern where
one or a few species have clearly elevated abu-
ndances compared to other species at the site is
typical for sites that are exposed to environmental
stress, e.g. pollution. However, Owenia fusiformis is
not known to have any strong opportunistic proper-
ties so stations 4, 5, 9 and 10 are expected to be
unpolluted or only moderately affected by pollution.
Especially as stations 4, 5, 9, and 10 correspond to
the stations located furthest away from the pollution
source, the oil rig.
The PCA ordination of the square root transformed
data (Fig. 5b) shows traces of the well-known arch
effect (Gauch et al., 1977) which is an indicator of a
very strong gradient in the data. Station 11 is located in
the upper left of the plot while the undisturbed stations
4, 5, 9 and 10 appear in the upper right part. Along the
arch the stations become less disturbed. Observe that
although both the ordination plots revealed clear
pollution gradients, it is necessary to include addi-
tional information in order to decide the direction of
the gradient.
The dendrogram representing the Bray–Curtis
dissimilarities is shown in Fig. 5c. There are two
main groups of stations in the data, the five lower
nodes in Fig. 5c versus the rest of the nodes but BR-
150/250 which is different from both of the groups.
BR-150/250 is identical to station 11 which also
separated from the other stations in the two ordination
plots (Fig. 5a and b). The two main groups are possibly
clustered according to their disturbance level with the
lower group as the least polluted. The motivation for
this assumption is that this group contains the stations
located furthest away from the three installations at
Oseberg-Brage, i.e. F, C and Br.
The MDS ordination of the dissimilarities (Fig. 5d)
is very similar to the PCA ordination (Fig. 5b). The
MDS plot also corresponds well with the dendrogram
(Fig. 5c) as station 11 clearly separates from the rest of
the stations, which divide into two main groups. The
clusters are however less distinct as there is a big
variation between replicate samples, particularly at
stations 10, 12 and 3.
The stations located furthest away from the oil rig
were included in the reference set, i.e. stations 3, 4, 5,
9 and 10. Cross validation of the square root
transformed reference set suggested two components
in the model. Calculating the model and fitting the
samples resulted in the CDI-values shown in Fig. 5e.
Station 11 and station 8 are the most disturbed
stations. Both these stations have RSD-values about
four times higher than the border of the model and
according to Table 4 these differences are probably
caused by pollution. The rest of the stations not
included in the reference set are also somewhat
disturbed. Compared to Ekofisk the Oseberg-Brage
field is considerably more polluted, as can be observed
from the higher CDI-values.
The CDI-values show clear decreasing tendencies
outward from the installations. Surprisingly, station 12
violates this trend. This may be due to the few
observed individuals in the replicates from this station.
As mentioned in the first paragraph of this section, one
species (Owenia fusiformis (Delle Chiaje 1842)) has
elevated abundances at the reference stations. This
means that possibly the reference stations also are
somewhat affected by pollution. In this case the CDI-
values report the level of pollution compared to a
moderately polluted reference set. Hence, the CDI-
values and accordingly the pollution gradients should
be even higher than reported here.
AMBI classify stations 1, 6, 8, 11 and 12 as
‘‘moderately disturbed’’ (3.3 < AMBI � 4.3) (Table 5)
and the rest of the stations as ‘‘slightly disturbed’’
(1.2 < AMBI � 3.3). All the stations included in the
reference set for the CDI calculations are in the least
disturbed group and although it is not a direct relation
between the CDI and AMBI results the correlation
between them is 0.8 for this data set. The average of
species not assigned to any ecological group is as low as
3.7% for Oseberg-Brage.
5. Discussion and conclusions
The three oil fields studied herein are polluted at
various levels. Troll is not disturbed, Ekofisk is
moderately disturbed and Oseberg-Brage is heavily
disturbed. The investigated multivariate methods
agree on the identification of clusters of samples
but for all of them the identification of the disturbed
G.R. Flaten et al. / Ecological Indicators 7 (2007) 254–276274
samples relies on lists of species abundance. AMBI
has an additional requirement which is preliminary
assignment of the species to one of five ecological
groups. On the other hand, if an assigned species list
covering all the species in the sample is available
AMBI can provide a classification in accordance with
the Water Framework Directive (2000/60/EC)
directly. In this work the AMBI results from Troll
and Ekofisk must be carefully interpreted due to a high
percentage of species not assigned to any ecological
group. For Oseberg-Brage the percentage of not
assigned species were low, and this was also the data
set for which AMBI corresponded best with the other
multivariate methods including the CDI.
The multivariate methods differ in their level of
information. The methods based on the dissimilarity
matrix do not contain quantitative information. This is
obvious, since the Bray–Curtis dissimilarity is not a
Euclidean distance measurement (Borg and Groenen,
1997, p. 55). Although quantitative information is not
kept there are methods and dissimilarity measures
that can give biologically significant information
from dissimilarity measures (e.g. Clarke and War-
wick, 1994) but this is not further discussed here. In
the ordination plots the quantitative differences are
conserved and to some extent it is possible to extract
them visually. On the other hand, it is not obvious how
to report the quantitative information beyond pre-
senting the ordination plots. In modelling, the model
describes the natural variation and the samples are
compared to this model. The samples’ distances to the
model are a measurement of the difference to the
natural variation. Thus, the CDI-values, which
correspond to normalised distances, can be used as
a quantitative measurement of pollution in an area
with relatively homogenous natural variation. Sam-
ples with CDI-values lower than 1.0 are not disturbed
at the selected level of significance (Eq. (10)).
Samples which have CDI-values higher than 1.0
are disturbed and the size of the CDI-value
corresponds to the level of disturbance (pollution).
Thus, the list of CDI-values for the samples in a data
set gives a quantitative report of the level of pollution
at each station. Such a list makes it possible to
compare the pollution level in different surveys for a
field monitored several times, for instance. Another
application is ranking of relative pollution levels
among different fields which is difficult by means of
ordination methods and the qualitative dissimilarity
matrix methods.
It is not emphasised in this work, but the
modelling approach depends on selecting a good
reference set so as to avoid misleading CDI-values.
A good reference set gives a good representation of
the natural variation in the investigated area. The
consequence of a poor reference set is incorrect
border of the model, i.e. the calculated CDI-values
cannot be regarded as reliable pollution estimates.
The reference set is poor when too few stations are
included (the natural variation is not fully spanned)
and it is poor if too many, i.e. disturbed stations, are
included (the model becomes too wide). Combining
available information regarding the test area and the
ordinations is the evident way of selecting reference
stations. For the cases studied in this work typically
the outermost stations (those furthest away from the
pollution source) are included in the reference set.
Other constraints can be feasible if the CDI approach
is used in other applications.
AMBI is also a multivariate index that uses the
quantitative information in the abundance data as well
as the species identity. AMBI builds on previously
proposed ecological models (Borja et al., 2000) and a
preliminary classification of species in one of five
ecological groups. This can be difficult if a new region
is investigated, c.f. the high percentage of not assigned
species for Troll and Ekofisk. However, over time it
should be possible to build a species list covering any
region of interest. The challenge is though that in order
to assign a species to an ecological group a certain
level of knowledge about the species is required.
Hence, the two critical points in AMBI are (i) that the
underlying model is correct and valid for all regions
and (ii) that all species are and are correctly assigned
to an ecological group.
In the earlier mentioned Water Framework Direc-
tive (2000/60/EC) it is proposed to categorize water
quality in five classes. The scientists behind AMBI has
taken this into account and proposed how AMBI can
be used in classification in accordance with the
directive (Borja et al., 2004). No such work is done in
order to adapt the CDI to the EC’s water directive.
However, the authors believe that the modelling
approach should be well suited to meet the require-
ments in the directive. Particularly since the directive
relate all the pollution or disturbance levels to ‘‘normal
G.R. Flaten et al. / Ecological Indicators 7 (2007) 254–276 275
conditions’’. Thus, if the scientist is able to choose a
reference set that reflects the normal conditions, CDI
is a direct reflection of how different the ecological
status of stations is from these normal conditions.
Combining methods is the best approach to
analysis of benthic count data. Some kind of clustering
or ordination is a good start. If the ordination separates
any groups the tables of species abundances help to
distinguish polluted and undisturbed samples. The
undisturbed samples can be brought together in a
reference set, and subsequently a model of the natural
variation in the sampling area can be calculated from
this reference set. Comparing the samples to the
calculated model gives the CDI-values, which reports
quantitative information regarding the relative level of
disturbance in the monitoring area.
Even though the CDI approach is confined to
estimation of pollution levels in this work, it is
believed that the approach also can be useful in other
applications where it is of interest to quantify e.g.
differences in ecological status. The prerequisite is a
set of samples that define a reference area or state
which other samples can be compared to. Hence, the
different samples’ similarities to the pre-defined
reference state are calculated and various environ-
mental gradients can be quantified.
Acknowledgements
Norsk Hydro ASA and Phillips Petroleum Com-
pany Norway are thanked for giving access to their
data from the regular survey and monitor program of
the oil fields on the Norwegian shelf. The Norwegian
Research Council (NFR) is thanked for their financial
support to the project.
References
Borg, I., Groenen, P., 1997. Modern Multidimensional Scaling—
Theory and Applications. Springer, New York.
Borja, A., Franco, J., Perez, V., 2000. A marine biotic index to
establish the ecological quality of soft-bottom benthos within
European estuarine and coastal environments. Mar. Pollut. Bull.
40 (12), 1100–1114.
Borja, A., Franco, J., Muxika, I., 2004. The biotic indices and the
Water Framework Directive: the required consensus in the new
benthic monitoring tools. Mar. Pollut. Bull. 48, 405–408.
Borja, A., Muxika, I., 2005. Guidelines for the use of AMBI (AZTI’s
marine biotic index) in the assessment of the benthic ecological
quality. Mar. Pollut. Bull. 50, 787–789.
Bray, J.R., Curtis, J.T., 1957. An ordination of the upland forest
communities of southern Wisconsin. Ecol. Monogr. 27, 325–
349.
Clarke, K.R., 1990. Comparison of dominance curves. J. Exp. Mar.
Biol. Ecol. 138 (1–2), 130–143.
Clarke, K.R., Warwick, R.M., 1994. Change in Marine Commu-
nities: An Approach to Statistical Analysis and Interpretation.
Plymouth Marine Laboratory, UK.
Diaz, R.J., Solan, M., Valente, R.M., 2004. A review of approaches
for classifying benthic habitats and evaluating habitat quality. J.
Environ. Manage. 73, 165–181.
Everitt, B., 1980. Cluster Analysis, 2nd ed. Heinemann, London.
Flaten, G.R., 2002. Dynamic environmental monitoring by means of
multivariate modelling. Ph.D. Thesis. University of Bergen,
Norway.
Flaten, G.R., Grung, B., Kvalheim, O.M., 2004. A method for
validation of reference sets in SIMCA modelling. Chem. Intell.
Lab. Syst. 72, 101–109.
Gauch, H.G., Whittaker, R.H., Wentworth, T.R., 1977. A compara-
tive study of reciprocal averaging and other ordination techni-
ques. J. Ecol. 65, 157–174.
Gray, J.S., 2000. The measurement of marine species diversity, with
an application to the benthic fauna of the Norwegian continental
shelf. J. Exp. Mar. Biol. Ecol. 250, 23–49.
Hirschfeld, H.O., 1935. A connection between correlation and
contigency. Proc. Camb. Philos. Soc. 31, 520–524.
Hotelling, H., 1933. Analysis of a complex of statistical variables
into principal components. J. Educ. Psychol. 24 417–444 and
498–520.
Hurlbert, 1971. The nonconcept of species diversity—a critique and
alternative parameters. Ecology 52, 577–586.
Johnson, R.A., Wichern, D.W., 1992. Applied Multivariate Statis-
tical Analysis, 3rd ed. Prentice Hall, New Jersey.
Jongmann, R.H.G., Ter Braak, C.J.F., Van Tongeren, O.F.R., 1995.
Data Analysis in Community and Landscape Ecology. Cam-
bridge University Press, Cambridge, UK.
Kruskal, J.B., 1964. Multidimensional scaling by optimizing
goodness of fit to a nonmetric hypothesis. Psychometrika 29,
1–27.
Kvalheim, O.M., Brakstad, F., Liang, Y.-Z., 1994. Preprocessing of
analytical profiles in the presence of homoscedastic or hetero-
scedastic noise. Anal. Chem. 66 (1), 43–51.
Massart, B.G.J., 1997. Environmental monitoring and forecasting by
means of multivariate methods. Ph.D. Thesis. University of
Bergen, Norway.
Massart, B.G.J., Kvalheim, O.M., Libnau, F.O., Ugland, K.I.,
Tjessem, K., Bryne, K., 1996. Projective ordination by
SIMCA: a dynamic strategy for cost-efficient environmental
monitoring around offshore installations. Aquat. Sci. 58 (2),
120–138.
Olsgard, F., Gray, J.S., 1995. A comprehensive analysis of the effects
of offshore oil and gas exploration and production on the benthic
communities of the Norwegian continental shelf. Mar. Ecol.
Progr. Ser. 122, 277–306.
G.R. Flaten et al. / Ecological Indicators 7 (2007) 254–276276
Pielou, E.C., 1966. The measurement of species diversity in differ-
ent types of biological collections. J. Theoret. Biol. 13, 131–144.
Rosenberg, D.M., Resh, V.H. (Eds.), 1993. Freshwater Bio-mon-
itoring and Benthic Macroinvertebrates. Chapman and Hall,
London.
Shannon, C.E., Weaver, W., 1949. The Mathematical Theory of
Communication. University of Illinois Press, Urbana.
Shepard, R.N., 1962. The analysis of proximities: multidimensional
scaling with an unknown distance function. Psychometrika 27,
125–140.
The Norwegian Pollution Control Authority (SFT), 1997. Guide-
lines for environmental monitoring of petroleum activities on the
Norwegian shelf, TA: 1424/1997.
Wold, S., 1976. Pattern recognition by means of disjoint principal
components models. Pattern Recogn. 8, 127–139.
Wold, S., 1978. Cross-validatory estimation of the number of
components in factor and principal components models. Tech-
nometrics 20, 397–405.
Wold, S., Esbensen, K., Geladi, P., 1987. Principal component
analysis. Chem. Intell. Lab. Syst. 2, 37–52.