Lagrangian analysis by clustering

16
Ocean Dynamics DOI 10.1007/s10236-010-0306-2 Lagrangian analysis by clustering Inga Monika Koszalka · Joseph H. LaCasce Received: 15 February 2010 / Accepted: 21 May 2010 © Springer-Verlag 2010 Abstract We propose a new method for obtaining av- erage velocities and eddy diffusivities from Lagrangian data. Rather than grouping the drifter-derived veloci- ties in geographical bins, we group them by nearest- neighbor distance using a clustering algorithm. This yields sets with approximately the same number of observations, covering unequal areas. A major advan- tage is that, because the number of observations is the same for the clusters, the statistical accuracy is more uniform than with geographical bins. We illustrate the technique using synthetic data from a stochastic model, employing a realistic mean flow. The latter represents the surface currents in the Nordic Seas and is strongly inhomogeneous in space. We use the clustering algo- rithm to extract the mean velocities and diffusivities and compare the results with the corresponding quan- tities from the stochastic model. We perform a similar comparison with the means and diffusivities obtained with geographical bins. Clustering is more successful at capturing the mean flow and improves convergence in the eddy diffusivity estimates. We discuss both the advantages and shortcomings of the new method. Keywords Lagrangian analysis · Eddy diffusivity · Binning · Clustering Responsible Editor: John Grue I. M. Koszalka (B ) · J. H. LaCasce Department of Geosciences, University of Oslo, P.O. Box 1022, Blindern, 0315 Oslo, Norway e-mail: [email protected] 1 Introduction Lagrangian instruments, surface drifters and subsurface floats, are widely used for measuring oceanic velocities. Their increased use in recent decades has resulted in coverage over large parts of the world oceans (e.g., http://www.aoml.noaa.gov/phod/dac/gdp.html). Given the amount of data being generated, it is important to continually improve our analysis techniques, to extract as much information as possible from that data. There are a wide range of Lagrangian data analysis techniques (LaCasce 2008). The most common tech- nique involves estimating Eulerian mean velocities and diffusivities. With these quantities, one can write an advection-diffusion equation describing the evolution of a tracer (Davis 1991): t θ + U θ =∇ ( Kθ ) (1) Lagrangian data can be used to determine U and K, the time-mean velocity and the eddy diffusivity tensor, both of which can vary in space. The method for calculating U and K is described by Davis (1991). Consider a data set covering a certain region. The drifter trajectories are used to calculate velocities along the drifter paths, by differencing. Then these velocities are grouped in geographical bins of a specified size to estimate the mean velocities in the bins (Fig. 1a). The means pertain to the period spanned by the data set. One assumes that the sampling in the bins is sufficient to capture the actual Eulerian means and that the statistics are stationary over this period. Examples of such calculations are found in Rossby et al. (1983), Owens (1991), Poulain et al. (1996), Swenson and Niiler (1996), and Fratantoni (2001).

Transcript of Lagrangian analysis by clustering

Ocean DynamicsDOI 10.1007/s10236-010-0306-2

Lagrangian analysis by clustering

Inga Monika Koszalka · Joseph H. LaCasce

Received: 15 February 2010 / Accepted: 21 May 2010© Springer-Verlag 2010

Abstract We propose a new method for obtaining av-erage velocities and eddy diffusivities from Lagrangiandata. Rather than grouping the drifter-derived veloci-ties in geographical bins, we group them by nearest-neighbor distance using a clustering algorithm. Thisyields sets with approximately the same number ofobservations, covering unequal areas. A major advan-tage is that, because the number of observations is thesame for the clusters, the statistical accuracy is moreuniform than with geographical bins. We illustrate thetechnique using synthetic data from a stochastic model,employing a realistic mean flow. The latter representsthe surface currents in the Nordic Seas and is stronglyinhomogeneous in space. We use the clustering algo-rithm to extract the mean velocities and diffusivitiesand compare the results with the corresponding quan-tities from the stochastic model. We perform a similarcomparison with the means and diffusivities obtainedwith geographical bins. Clustering is more successfulat capturing the mean flow and improves convergencein the eddy diffusivity estimates. We discuss both theadvantages and shortcomings of the new method.

Keywords Lagrangian analysis · Eddy diffusivity ·Binning · Clustering

Responsible Editor: John Grue

I. M. Koszalka (B) · J. H. LaCasceDepartment of Geosciences, University of Oslo,P.O. Box 1022, Blindern, 0315 Oslo, Norwaye-mail: [email protected]

1 Introduction

Lagrangian instruments, surface drifters and subsurfacefloats, are widely used for measuring oceanic velocities.Their increased use in recent decades has resulted incoverage over large parts of the world oceans (e.g.,http://www.aoml.noaa.gov/phod/dac/gdp.html). Giventhe amount of data being generated, it is important tocontinually improve our analysis techniques, to extractas much information as possible from that data.

There are a wide range of Lagrangian data analysistechniques (LaCasce 2008). The most common tech-nique involves estimating Eulerian mean velocities anddiffusivities. With these quantities, one can write anadvection-diffusion equation describing the evolutionof a tracer (Davis 1991):

∂t〈θ〉 + U∇〈θ〉 = ∇(

K∇〈θ〉) (1)

Lagrangian data can be used to determine U and K, thetime-mean velocity and the eddy diffusivity tensor, bothof which can vary in space.

The method for calculating U and K is described byDavis (1991). Consider a data set covering a certainregion. The drifter trajectories are used to calculatevelocities along the drifter paths, by differencing. Thenthese velocities are grouped in geographical bins of aspecified size to estimate the mean velocities in thebins (Fig. 1a). The means pertain to the period spannedby the data set. One assumes that the sampling in thebins is sufficient to capture the actual Eulerian meansand that the statistics are stationary over this period.Examples of such calculations are found in Rossby et al.(1983), Owens (1991), Poulain et al. (1996), Swensonand Niiler (1996), and Fratantoni (2001).

Ocean Dynamics

a b

Fig. 1 a A sketch showing Lagrangian observations grouped ingeographical bins. b Lagrangian data partitioned by the clusteringalgorithm under the constraint of a prescribed amount of mem-bers in a cluster

The diffusivity calculation stems from that of Taylor(1921). For example, in the zonal direction, this is:

κxx(t) ≡ 12

ddt

< x2L(t) >=< xL(t)uL(t) >

=∫ t

0< uL(t)uL(τ ) > dτ

=∫ t

0Pxx(τ ) dτ (2)

where xL is Lagrangian displacement, uL the Lagrangianvelocity, and P(τ ) the time-lagged Lagrangian velocitycovariance. Davis (1991) allows for the diffusivity toalso vary in space. To calculate this, one replaces thevelocities above with “residual velocities”, those withthe mean removed, and the same with the displace-ments. The diffusivities are obtained for each bin andthe averages over all trajectories in the bin. As such, thediffusivity is a mixed Eulerian–Lagrangian measure.It is Lagrangian because it involves integrating alongparticle paths, but it is Eulerian because the integraloccurs for drifters in a specified area and because itinvolves subtracting the Eulerian mean.

There are a number of practical issues with regardsto binning (e.g., Mariano and Ryan 2007). One con-cerns the bin size. The bins should be small enoughto resolve the mean flow but larger than the scale ofthe energy-containing eddies. It should also be largeenough to yield a statistically significant estimate. Thelatter necessarily varies between bins, as the amount ofdata in each bin varies. Such variations can lead to biaserrors (Davis 1991).

The diffusivities are similarly affected by the bin size.We assume that the diffusivity converges at long times,i.e., κ(�x, t) → κ∞(�x) as t → ∞. However, the integra-tion time in Eq. 2 depends on the time a drifter spendsin the bin, and this will generally differ between indi-vidual drifters in the same bin. As such, the mean auto-correlation derives from segments of differing lengths,and this can affect the convergence of the integral (see

below). Using larger bins improves this, by allowing forlonger individual segments, but some tracks will alwaysbe shorter than others.

The binning technique has been widely applied toocean data, and different bin sizes and even differentbin shapes and orientations have been explored (e.g.,Swenson and Niiler 1996; Falco et al. 2000; Poulain2001; Jakobsen et al. 2003; Lumpkin and Garraffo2005; Davis 1998; Thompson et al. 2009). Improve-ments such as fitting the binned velocities with cubicsplines (Bauer et al. 2002), using different sized binsfor the means and diffusivities (e.g., Poulain et al. 1996;Swenson and Niiler 1996), using different asymptoticlimits for the diffusivity integration (e.g., Poulain et al.1996; Brink et al. 2000; Thompson et al. 2009), andusing different equivalent formulations to Eq. 2 (e.g.,Colin de Verdiere 1983; Zhurbas and Oh 2003) have allbeen explored.

Hereafter, we examine an alternate idea. Ratherthan grouping the velocities in bins of fixed size, wegroup a specified number of nearest-neighbor realiza-tions together (Fig. 1b) using a clustering algorithm.Such algorithms are used in diverse fields, such as datamining, image processing, and bioinformatics (Lloyd1982; Kanungo et al. 2002; MacKay 2003). Specifyingthe number of members in the cluster then determinesthe number and spatial extent of the clusters for thewhole data set.

The resulting mean velocities are on a nonuniformgrid. However, the coverage is determined by the data;we do not obtain estimates where there are few or nomeasurements. A major advantage though is that thereare approximately the same number of realizations ineach cluster. As such, the standard error will dependonly on the standard deviation of the velocity ratherthan also depending on the number of observations inthe bin.

The calculation of the diffusivities also differs. First,we evaluate the velocity autocorrelation with Eq. 2 fora chosen f ixed period of time. We assign a position toeach autocorrelation (the midpoint along the trajectorysegment) and then cluster those positions. We thenaverage the autocorrelations in the cluster, with eachcluster having a prescribed number of segments. Theaverage is then integrated over the time interval equalto the segment length to obtain an estimate of κ∞(�x).The length and number of contributing trajectories arethus the same, and these values can be adjusted toimprove convergence.

The method of calculating the diffusivity is similar tothat used previously by Garraffo et al. (2001), Lumpkinand Flament (2001), Lumpkin et al. (2002), and Rupolo(2007). These authors also used trajectory segments of

Ocean Dynamics

a fixed length in calculating the diffusivity. In contrastthough, most used mean velocities from individual tra-jectories rather than the interpolated Eulerian meansestimated from the entire data set. And their estimateswere grouped into geographical bins, yielding differentnumbers of data points in each bin.

We illustrate the clustering method hereafter usingsynthetic trajectories. The latter are generated with afirst-order stochastic model, using mean velocities rep-resentative of the surface currents in the Nordic Seas.The result is a data set with known mean velocitiesand diffusivities, allowing us to test the accuracy ofour estimates. In addition, we calculate correspondingestimates using bins and compare the results. The cur-rents in the Nordic Seas are narrow and strongly in-homogeneous, so this is a fairly strenuous test. Usingsynthetic data also ensures that we are not limited bythe size of the data set.

Previous authors have used stochastic models forLagrangian analysis (e.g., Griffa 1996; Falco et al. 2000;Garraffo et al. 2001; Veneziani et al. 2004; Rupolo2007; Sallee et al. 2008). The goal in these studies wasto use the stochastic models to reproduce dispersioncharacteristics in observations. We are treating the sto-chastic trajectories as the observations, as was done,for example, by Bauer et al. (1998). Davis (1991) usedsynthetic trajectories in this way, to evaluate estimationerrors under binning. However, he did not address thedependence on bin size, an issue addressed here.

The paper is organized as follows: The study regionand simulated Lagrangian particles are described inSection 2. In Section 3, we consider mean velocities, andeddy diffusivities are addressed in Section 4. We discussthe results in Section 5.

2 Data

For the synthetic trajectories, we employ a first-orderstochastic model (e.g. Griffa 1996), for which the parti-cle positions are given by:

dxi = (ui + U(x, y)) dt, dyi = (vi + V(x, y)) dt

dui = − 1TL

ui dt +√

2TL

ν dw,

dvi = − 1TL

vi dt +√

2TL

ν dw. (3)

The subscript refers to the particle, (U, V) is the back-ground mean flow, ν is the square root of the eddyvelocity variance, TL is the Lagrangian integral timescale, and dw is a Wiener (normal) noise process. The

two components of the velocity u and v are assumedindependent. The velocity autocorrelation is given by:

P(τ ) =< u(t)u(t + τ) >= ν2e(−τ/TL). (4)

From Eq. 2, the diffusivities have the asymptotic valueof κ∞ = ν2TL.

As noted, we use estimates of the surface currentsin the Nordic Seas for the mean velocities, (U, V).The dominant feature here is the Norwegian AtlanticCurrent, off the western Norwegian coast. This is 20–30 km wide in its core, a distance somewhat largerthan the deformation-scale eddies (5–10 km) whichare ubiquitous here (Poulain et al. 1996; Skagseth andOrvik 2002; LaCasce 2005; Koszalka et al. 2009). Ourrepresentation derives from a 1-year simulation withthe 4-km MIPOM model of the Norwegian Meteoro-logical Institute. This produces fairly realistic velocityfields (LaCasce and Engedahl 2005). The velocitieswere resampled on a regular grid of 0.25◦ × 0.25◦ andare contoured in Fig. 2a. The means were then inter-polated onto the particle’s instantaneous positions foradvection.

The model also requires the root mean square (rms)velocity, ν, and the Lagrangian integral time scale, TL.Based on earlier estimates (Poulain et al. 1996; LaCasce2005; Andersson et al., submitted for publication), weassign values of ν = 20 cm/s and TL = 1 day.1 Thisyields an effective length scale L = νTL = 18 km, com-parable to the core width of the Norwegian Atlanticcurrent. For simplicity, we assume that the eddy sta-tistics are isotropic and homogeneous. Koszalka et al.(2009) used a similar stochastic model for comparisonwith drifter trajectories in the same region.

Two thousand particles were deployed on a regu-larly spaced grid and advected for 60 days, yielding ca.105,000 drifter days. This is comparable to the numberof actual drifter days currently available in the NordicSeas; however, the areal coverage in the synthetic setis much more uniform. Seeding on a uniform gridalso reduces the “array bias”, which can influence thediffusivities (Davis 1991). Some particles collided withthe coast or islands, and we discarded the subsequentportions of those trajectories.

The model time step was dto = 0.01 day, and thedata were saved with a time step of dt = 0.1 day (onetenth of the integral time). The resulting trajectoriesare plotted in Fig. 2b. For comparison, we ran an addi-tional simulation with 2,000 stochastic particles with

1Poulain et al. (1996) found TL = 1 − 3 days here, whileAndersson et al. (submitted for publication) estimated TL =1.1 days. LaCasce (2005) found that the Eulerian integral timeis 1 to 2 days, which implies an equal or shorter Lagrangian time.

Ocean Dynamics

Fig. 2 a Magnitude of themean velocity field|U(x, y)| = √

U2 + V2

(centimeters per second)from a MIPOM modelsimulation of the Nordic Seasused to advance stochasticparticles according to Eq. 3.b Trajectories from 2,000synthetic particles evolved for60 days with a first-orderstochastic model embeddedin this mean flow.Deployment positions aremarked with circles

a)

−15 −10 −5 0 5 10 15

62

64

66

68

70

72

74

76

b)

−15 −10 −5 0 5 10 15

62

64

66

68

70

72

74

76

zero mean flow (U = 0, V = 0), all other parametersbeing the same.

3 Mean velocities

We focus first on extracting the mean velocities fromthe drifters. The resulting estimates will be comparedto the actual U, V values from the MIPOM simula-tion (used as input to generate the trajectories). Wehave velocities with a time step of dt = TL/10, but weuse only a subset of these for calculating the means,with dt = 2TL. Then each observation is treated asindependent.

3.1 Methods

For binning the velocities, we must first choose the binsizes. The bins should be small enough to resolve themean flow but larger than the eddy scale. They shouldalso be large enough to yield statistically significantestimates. The Nordic Seas is problematic in this regardbecause the mean and the eddy scales are compara-ble. Previous authors used (2◦ × 1◦) bins in this region(Poulain et al. 1996; Saetre 1999; Jakobsen et al. 2003).2

2The dimensions are listed (degrees longitude × degrees lati-tude). With (2◦ × 1◦), the bins are close to square in the southernpart of the domain but are more rectangular in the north.

Such bins have a length scale of roughly 100 km. Wedenote this as our “intermediate” bin size. In addition,we examine smaller and larger bins, with dimensions(4◦ × 2◦) and (1◦ × 0.5◦).

For the clustering, we employ the “k-means” clus-tering algorithm (Lloyd 1982). The algorithm partitionsthe nT observations (x1, x2, ..., xn) into k subsets (clus-ters), S = S1, S2, ..., Sk, such that each observation isassigned to the nearest cluster in a way that minimizesthe sum, over all clusters, of the squared distance be-tween cluster members and the cluster center μi:

mink∑

i=1

x j∈Si

‖x j − μi‖2. (5)

As the cluster centers themselves depend on the posi-tions of the observations, this is necessarily done itera-tively, in a two-step assignment/update process. In theassignment step, each data point is assigned to thenearest center. In the update step, cluster centers areadjusted to match the sample means of their memberdata points. This is repeated until the assignments areunchanged. For more information on clustering algo-rithms, see, e.g., Kanungo et al. (2002) and MacKay(2003).

The main parameter to be specified is k, the numberof clusters. If we wish to have clusters with m members,then k = nT/m. As with the bins, we use three choices,ranging from coarser to finer resolution. We chose mso that the mean standard error among the clusters was

Ocean Dynamics

the same as that in the corresponding bins. The error isdefined:

< σ >=<ν√n

>, (6)

where ν and n are the rms velocity and the numberof realizations in the bin/cluster and the brackets indi-cate an average over all the bins/clusters. Alternately,we could have chosen m to match the mean num-ber of observations in the bins, but the latter varieswidely among bins, as will be seen. Matching meanerrors yields clusters with m = 125, m = 75, and m = 45members. To guarantee that all the clusters have ap-proximately m observations, we modified the k-meansalgorithm (as described in “Appendix”).

The various parameters for the bins and clustersare shown in Table 1. Note that the “coarse” bins areroughly twice as large as the coarse clusters and havenearly twice as many observations, on average. The“fine” bins and clusters are more comparable in bothregards.

3.2 Results

Shown in Fig. 3 are the means obtained by bin-averaging (panels a–c) and by clustering (panels d–f).In the lower panels, the clustered means are linearlyinterpolated onto the same grid as for the input modelfield (panels g–i), for comparison with the actual meanflow, in Fig. 2a.

Consider the bins first (panels a–c). With the finestresolution (1◦ × 0.5◦), the major structures in the sur-face current are recovered. These include the inflownorth of Iceland and the inner and outer branches ofthe Norwegian Atlantic Current (e.g., Orvik and Niiler2002). With the (2◦ × 1◦) bins, we observe where thecurrents are stronger and weaker but lose much of thefiner structure. The currents with the (4◦ × 2◦) bins arehard to recognize.

The results from clustering are shown in panels d–f.With m = 45, the means are comparably well-resolvedas those in the finest resolution bins, with the exceptionof the currents along the northern periphery (which arenot resolved here but marginally seen with the binned

set). But the m = 75 and m = 125 clusters are alsofairly successful at capturing the mean flow structure.The primary difference is that, with larger m, there arefewer clusters.

Of course, part of the difference between the clus-tered means and the actual field (Fig. 2a) is due tothe uneven plotting with the former. Interpolating theclustered means onto the same (0.25◦ × 0.25◦) grid asfor the input mean flow yields the fields in the lowerpanels of Fig. 3. We see that the primary structuresare captured in the clustered means, even with m = 125(Fig. 3g). Interpolating the binned means on the otherhand produces smoothed versions of those fields (notshown) and produces results comparable to the inputfield only with the (1◦ × 0.5◦) bins.

Figure 4 shows further how the statistics vary be-tween the two methods. In panels a and b, we plot thedistributions of the number of observations in the binsand clusters, respectively. While the largest bins havemany observations, the majority have far fewer. Thus,the distributions are skewed and the mean number ofobservations in the bins (Table 1) is not representativeof the majority. The clusters on the other hand havenearly a delta-function distribution; all the clusters haveapproximately m observations, by design.

A second difference is seen in Fig. 4c, which showsthe size of the bins and clusters as a function of themean standard error. For a given error, the averagebin covers a larger area than the corresponding clus-ter. Moreover, the area covered by the clusters is lesssensitive to the mean error than with the bins. Bothpoints follow from the differences in the numbers ofobservations. Since the clusters have roughly the samenumber of observations, it is easier to control the error.But the errors in the bins vary widely, just as thenumbers of observations do. Since the clusters coversmaller areas, they are more successful at capturing thefiner-scale structures in the mean.

The standard error determines the significance of themeans in the bins/clusters. In Fig. 5, we examine wherethe calculated means differ significantly from the actualmeans, averaged over the same areas. Bins in which themeans are not statistically different are shown in bluewhile the purple bins indicate a significant difference

Table 1 Parameters of the binning and clustering assignments

Resolution Long × Lat km No. of bins <n> <σ > (cm/s) m Nc Dc (km) <σ > (cm/s)

Coarse 4 × 2 186 61 865 1.8 125 452 80 1.9Medium 2 × 1 93 225 235 2.5 75 775 54 2.5Fine 1 × 0.5 47 839 63 3.5 45 1,356 34 3.3

Bin size (long × lat), bin length scale in kilometers (square root of the area covered by the bin), number of bins, average number ofobservations in bins, mean standard error in the bins, number of members in cluster, number of clusters, mean cluster diameter, andmean standard error in clusters

Ocean Dynamics

a) 4 x 2

−15 −10 −5 0 5 10 15

62

64

66

68

70

72

74

76

b) 2 x 1

−15 −10 −5 0 5 10 15

62

64

66

68

70

72

74

76

c) 1 x 0.5

−15 −10 −5 0 5 10 15

62

64

66

68

70

72

74

76

d) m=125

−15 −10 −5 0 5 10 15

62

64

66

68

70

72

74

76

e) m=75

−15 −10 −5 0 5 10 15

62

64

66

68

70

72

74

76

f) m=45

−15 −10 −5 0 5 10 15

62

64

66

68

70

72

74

76

g) m=125

−15 −10 −5 0 5 10 15

62

64

66

68

70

72

74

76

h) m=75

−15 −10 −5 0 5 10 15

62

64

66

68

70

72

74

76

i) m=45

−15 −10 −5 0 5 10 15

62

64

66

68

70

72

74

76

Ocean Dynamics

� Fig. 3 Pseudo-Eulerian estimate of the mean speed |U(x, y)|derived through averaging of the synthetic Lagrangian observa-tions. Top Obtained by binning the data in grids with varyingbin size—4◦ × 2◦(a), 2◦ × 1◦(b), and 1◦ × 0.5◦(c). Bins with nodata are plotted in gray. Middle Obtained by clustering the datawith different numbers of members—m = 125 (d), m = 75 (e),and m = 45 (f). Bottom Clustered estimates interpolated onto aregular grid of (long × lat) = 0.25◦× 0.25◦ —m = 125 (g), m = 75(h), and m = 45 (i)

(panels a–c). Panels d–f show the corresponding fieldsfor the clusters.

One might expect that, because increasing the bin/cluster area increases the number of observations inthem, this would likewise reduce the errors. But the

percentage of rejected bins is actually greater withlarger bins and clusters than with smaller ones. The rea-son for this lies with the mean flow. Because the meanis so inhomogeneous, using a larger bin involves aver-aging over a wider range of U, V values. The standarderror is smaller because the number of observations isgreater, making it less likely that the two estimates arestatistically the same. In a sense, the larger bins producea more certain answer of an incorrect velocity. Withsmaller bins, the sampled mean is more homogeneousand the error larger, increasing the probability of repro-ducing the mean flow fields correctly in the bin area.

A larger proportion of bins than clusters are rejectedfor a given mean standard error (Fig. 5c). This is again

100

101

102

103

0

10

20

30

40

50

60

70

80

N*

% B

INS

a)

4x22x11x0.5

100

101

102

103

0

10

20

30

40

50

60

70

80

N*

% C

LUS

TE

RS

b)

1257545

1 1.5 2 2.5 3 3.5 40

50

100

150

200

250

<σ> (cm/s)

LEN

GT

H S

CA

LE (

km)

c)

45

75

125

1 x 0.5

2 x 1

4 x 2

CLUSTBIN

Fig. 4 a Distributions of the number of independent observa-tions grouped in bins of different size. b Distributions of thenumber of independent observations in clusters obtained with

different parameter m. c Mean length scale (square root of thearea covered by the bin and cluster diameter) vs. mean samplingerror for binning and clustering analyses

Ocean Dynamics

a) 4 x 2

−15 −10 −5 0 5 10 15

62

64

66

68

70

72

74

76

b) 2 x 1

−15 −10 −5 0 5 10 15

62

64

66

68

70

72

74

76

c) 1 x 0.5

−15 −10 −5 0 5 10 15

62

64

66

68

70

72

74

76

d) m=125

−15 −10 −5 0 5 10 15

62

64

66

68

70

72

74

76

e) m=75

−15 −10 −5 0 5 10 15

62

64

66

68

70

72

74

76

f) m=45

−15 −10 −5 0 5 10 15

62

64

66

68

70

72

74

76

1 1.5 2 2.5 3 3.5 40

20

40

60

80

100

<σ> (cm/s)

% B

INS

/CLU

ST

ER

S R

EJE

CT

ED

g)

1 x 0.5

2 x 1

4 x 2

45

75

125

CLUSTBIN

Ocean Dynamics

� Fig. 5 Comparison of the means in the bins a, b and clusters d–fwith the actual means, U, V, used in generating the particle tra-jectories. Purple color codes bins/clusters that have means whichare different from the actual means at the 95% confidence level,and blue color indicates means that are the same. g Percentageof bins and clusters where the means are significantly different(“rejected”)

because the clusters cover smaller areas. The threetypes of cluster used produce a rejection rate between14% and 22%, while 25–60% of the bins are rejected.Thus, the clusters are more successful at capturing theactual means.

We would obtain different results had we used adifferent metric for comparing the bins and clusters.For instance, if we match the mean number of obser-vations, we obtain larger clusters and a less well-resolved mean. But, as noted earlier, the mean numberof observations is not representative for the bins, dueto their skewed distributions of observations. This isbecause the bins, unlike the clusters, are not necessarilywhere the data are.

4 Diffusivities

4.1 Diffusivities with zero mean flow

Now we turn to the eddy diffusivities. There are sev-eral technical issues to be addressed. First is how it isactually calculated. Some compute it from the integralof the ensemble-mean velocity autocorrelation (Eq. 2;e.g., Poulain et al. 1996; Poulain 2001; Thompson et al.2009). Others prefer the product of the residual veloc-ity and displacement (e.g., Swenson and Niiler 1996;Zhurbas and Oh 2003), while some compute half thederivative of the mean absolute dispersion (Colin deVerdiere 1983). The different approaches are not of-ten compared (Zhurbas and Oh 2003). We will doso briefly here, using the stochastic trajectories withzero mean flow. Without a mean flow, the residualvelocities are the same as the particle velocities and arehomogeneous.

The diffusivity estimates, κ(t), from the three meth-ods are shown in Fig. 6a. Also shown is the theoreticalcurve, obtained by integrating the exponential autocor-relation for a first-order stochastic process:

κ(t) = ν2∫ 0

−texp(−|t′|/TL) dt′ = κ∞

(1 − exp

(− t

TL

)).

(7)

With TL = 1 day and ν = 20 cm/s, the asymptotic limitis κ∞ = ν2TL = 3.46 × 107 cm2/s.

The derivative of the absolute dispersion and theproduct of the velocity and displacement produce thesame result, within the errors. The diffusivities asymp-tote to the theoretical limit after 3 to 4 days but exhibitsignificant oscillations thereafter. The integral of theautocorrelation on the other hand yields a smoothercurve, and this lies near the theoretical curve. The rea-son this differs from the other two is that integratingthe mean autocorrelation is a smoothing operation.So much of the variability seen in the other curves isremoved; Davis (1991) concluded the same. We will usethis method exclusively hereafter.

There are two additional points. First is that thecurves in Fig. 6a derive from 2,000 particles—an enor-mous number in relation to most observational stud-ies. Such experiments typically have at best an orderof magnitude fewer, and this affects the convergence.Examples with fewer particles, using the integratedautocorrelation, are shown in Fig. 6b. With an ensembleof 100 particles, the diffusivity estimate is within 10%of the theoretical value. The asymptote can be approx-imately correct with fewer particles, but the errors arelarger.

Second, because the diffusivities should convergeafter 3 to 4 days, we require track segments of atleast that length to obtain proper estimates. Shown inFig. 6c are the integrals obtained with 100 trajectoriesof varying length. For tracks with five or fewer days,the curves asymptote to values below the theoreticallimit. Evidently track lengths of 10 days, or ten timesthe integral time, are required to obtain reasonableestimates.

So even in the best case scenario with no mean flow,a meaningful estimate of the eddy diffusivity requires100 track segments of at least 10 TL duration. Knowingthis helps interpret the subsequent results with themean flow restored.

4.2 Diffusivities with a mean flow

Now consider the diffusivities with the mean flow pre-sent. We perform the calculations using the three binand cluster classes discussed previously. For the means,we use averages obtained in the fine resolution cases,i.e., from the (1◦ × 0.5◦) bins and from the m = 45clusters. Although the mean standard errors are largerwith these cases, they best capture the detailed flowstructure (Fig. 3). We linearly interpolated those meansonto the instantaneous drifter positions to obtain theresidual velocities.

For the bins, we use only those segments of driftertrajectories while the drifters were in the bins. Thesewere of varying length, as the drifters spent different

Ocean Dynamics

0 2 4 6 8 10 12 140

1

2

3

4

5

6

days

κ (1

07 cm

2 /s)

a)

theoradisp/udautocorr

0 5 10 15 20 25 300

1

2

3

4

5

6

7

8

days

κ (1

07 cm

2 /s)

b)

5 50100500

0 2 4 6 8 100

1

2

3

4

5

6

7

8

days

c)

602010 5 2 1

Fig. 6 a Diffusivity curves derived from 2,000 particles evolvingfor 60 days in a zero mean flow by mean sequences of the timederivative of the absolute dispersion (adisp), mean products ofthe single-particle velocity and its displacement (ud), and integra-tion of the ensemble averages of the autocorrelation sequences(autocorr), compared to the theoretical value (theor). Estimationerrors δκ are derived from errors on the autocorrelation given by

the t-test at the 95% significance level. b Diffusivity curves fromthe autocorrelation method computed with a varying number ofparticles, each time series being 60 days long, compared to thetheoretical curve drawn in black. c Diffusivity curves from theautocorrelation method computed for 100 particles with a varyinglength of the time series. The theoretical curve is drawn in black

times in the bins. We averaged the autocorrelationsfrom the individual tracks to obtain the bin diffusivity(Davis 1991). We did this for each of the three binclasses (Table 1).

With the clusters, we essentially reverse the proce-dure. First, we break all trajectories into segments ofa chosen, uniform time length. Then we calculate theautocorrelations for each segment. The segment is as-signed to a position (the midpoint along the track), andthose positions are clustered as in Section 3.2. Then theautocorrelations for all segments in the cluster are aver-

aged and integrated. We chose the number of membersin the cluster to be 100. With 10-day segments, thisyielded 122 clusters, with a mean radius of 76 km. With20-day segments, we obtained 62 clusters with a meanradius of 90 km.

Thus, with the bins and clusters, we obtain time se-ries of the diffusivities. The question then is how to esti-mate the asymptotic value, κ∞. Ideally, we would takethe value as t approaches infinity. But this is impracticalbecause the particles leave the bin after a finite periodof time and also because the sampling error increases

Ocean Dynamics

as t1/2 (Davis 1991). A number of authors take thefirst maximum value of the series, which is similar tointegrating the autocorrelation to the first zero-crossing(e.g., Brink et al. 2000; Lumpkin et al. 2002; Rupolo2007). However, the exponential autocorrelation ob-

tained with the stochastic model theoretically has nozero crossing at finite lag. So instead, we average thediffusivity over a fixed period, from 4 to 8 days. Ifthe mean autocorrelation is shorter than 8 days in agiven bin, the integration terminates. If it is shorter than

a) 4 x 2

−15 −10 −5 0 5 10 15

62

64

66

68

70

72

74

76

b) 2 x 1

−15 −10 −5 0 5 10 15

62

64

66

68

70

72

74

76

c) 1 x 0.5

−15 −10 −5 0 5 10 15

62

64

66

68

70

72

74

76

d) 20 days

−15 −10 −5 0 5 10 15

62

64

66

68

70

72

74

76

e) 10 days

−15 −10 −5 0 5 10 15

62

64

66

68

70

72

74

76

Fig. 7 Maps of eddy diffusivity, scaled by the target theoreti-cal value, derived from the synthetic particles obtained by thebinning method for different bin sizes—4◦ × 2◦ (a), 2◦ × 1◦ (b),and 1◦ × 0.5◦ (c)—and by the clustering method for different

segment lengths—20 days (d) and 10 days (e). All estimates wereinterpolated onto a regular grid of (long × lat) = 0.25◦× 0.25◦prior to plotting

Ocean Dynamics

4 days, no estimate of κ∞ is produced. The results donot change qualitatively when using other choices forthe averaging period (e.g., from 5–10 days).

The resulting estimates for κ∞ are mapped inFig. 7a–c for bins of different size and in panels d and efor clusters with 100 track segments of 20 and 10 days,respectively. We normalize κ∞ by its theoretical value.

For consistency, all estimates are interpolated onto aregular grid of 0.25◦ × 0.25◦ and contoured with thesame range of values, from 0 to 1.5. The correct value is1.0, which is contoured in yellow.

The normalized estimates with the (4◦ × 2◦) binsspan the range from near 0 to 1.3. Too low values arefound near the borders of the domain, and too large

100

101

102

103

0

2

4

6

8

10

12

14

16

18

20

22M

TL

(DA

YS

)

NO. SEGMENTS

1 x 0.5

2 x 1

4 x 2

10

20

a)

1 2 4 10 200

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

0.69

0.860.85

1 1

BINSIZE (DEGR) SEGMENT LENGTH (DAYS)

b)

1 2 4 10 200

0.5

1

1.5

2

2.5

3

3.5

0.54

0.36

1.29

1.9

3.03

BINSIZE (DEGR) SEGMENT LENGTH (DAYS)

c)

Fig. 8 a A scatter plot showing the number of segments andmean segment length obtained in bins for different bin sizes—4◦ × 2◦ (cyan), 2◦ × 1◦ (blue) and 1◦ × 0.5◦ (green) for thebinning method—and for different prescribed segments lengths(10 and 20 days, red) for the clustering method. The mean valuesof these parameters over all bins/clusters marked with rectangles

and circles, respectively. The number of segments refers to τ = 0and it falls off thereafter due to variable length of the tracksthat occur in the bin. b A spread of estimates of eddy diffusivityκ∞ , scaled with the “target” theoretical value in binning andclustering assignments. c The error of the diffusivity estimate< δκ >, averaged over all bins/clusters

Ocean Dynamics

ones occur near the coasts. In the interior, the valuesare consistently low, with typical values of 0.8–0.9. Withthe (2◦ × 1◦) bins, the diffusivity exhibits smaller-scalevariations, and there are many regions in the interiorwere the values are too large. The variations are moremarked with the (1◦ × 0.5◦) bins, with pockets of highand low values.

The diffusivities with the clusters are more uniform,both for the 20- and 10-day segments. The extreme lowestimates found with the bins do not occur. Instead, thevalues vary between 0.8 and 1.2. There are larger valuesalong the periphery, but also in the interior.

A detailed comparison of the bin/cluster statistics isshown in Fig. 8. Panel a is a histogram of the averagelength of the segments used in calculating the auto-correlation for each cluster or bin. The clusters havesegment lengths of 10 and 20 days, by design. The binshave a range of values, but in most cases, the averagelength is below 7 days. None exceed 10 days. The meanover all the bins is 5, 3, and 1.5 days, with decreasingbin size.

The second point concerns the number of segments.Again, the clusters have nearly the same number. Thereare small variations, as the clustering procedure couldnot always obtain 100 segments. Nevertheless, mostclusters have 80–120 segments. The bins on the otherhand exhibit a wide range of values. There are some(4◦ × 2◦) bins with over 700 segments and other withless than 10. And there are some (1◦ × 0.5◦) bins withonly two or three segments. The average number ofsegments is 280, 131, and 64 for the bins, in order ofdecreasing area.

Based on the findings in Section 4.1, we expect thatthe binned estimates of κ∞ should vary more and bebiased low because the segments are generally tooshort. This is the case. Shown in Fig. 8b are scatterplotsof the diffusivities for the five cases. The bin estimatesspan the range from zero to 1.5 times the actual diffu-sivity. The spread is less with the larger bins but stillpronounced. In all cases, the diffusivities are skewedtoward low values. Thus, the average diffusivities forall bins are also low.

The clusters on the other hand yield estimates from0.8 to 1.2 times the actual diffusivity. The distributionsare not skewed, so the averages over all the clusters,both with 10- and 20-day segments, agree with theactual diffusivity.

In the Fig. 8c, we plot the diffusivity errors. Thesederive from the student t-test at the 95% significancelevel, averaged over 4–8 days and over all bins/clustersand normalized by the theoretical value of κ∞. Theerrors are the largest with the (1◦ × 0.5◦) bins anddecrease with increasing bin size. However, both cluster

examples have significantly smaller errors. The meanerror is 0.36 times the actual diffusivity with 20-daysegments, as compared with 1.29 times the diffusivityfor the “best” binning case.

Other methods for determining the diffusivity yieldsimilar results. Using the zero-crossing method for esti-mating κ∞ yields a similar range of estimates, albeitwith slightly larger average diffusivities. The diffu-sivities are nevertheless skewed to smaller values.

The primary shortcoming with the binning calcula-tion is that the segments are too short. With small bins,there are few particles which remain in any bin forperiods longer than TL. Thus, the mean autocorrelationcurves do not reach the asymptotic period (Fig. 6c).An alternate approach, in line with that of Garraffoet al. (2001), Lumpkin and Flament (2001), Lumpkinet al. (2002), and Rupolo (2007), would be to breakthe trajectories into uniform segments and regroupthem in bins. Then one could control the length of theautocorrelations, just as we have done for the clusters.But by grouping in bins, we would still obtain differentnumbers of observations in different bins, as we foundwith the mean velocities.

5 Summary and discussion

We considered a new method for calculating pseudo-Eulerian mean velocities and eddy diffusivities fromLagrangian data. This involves grouping a specifiedamount of data into spatially localized subsets using a“clustering” algorithm (e.g. Lloyd 1982; MacKay 2003).This is in contrast to the commonly used method inwhich the data is separated into geographical bins ofa specified size. We compared the two approaches byanalyzing a set of 2,000 trajectories generated with afirst-order stochastic model, with a mean velocity rep-resentative of that at the surface in the Nordic Seas andwith comparable eddy parameters.

Using bins yields Eulerian estimates on a uniformgrid. But as the number of observations varies greatlyfrom bin to bin, so does the statistical significance.Clustering on the other hand produces sets with roughlythe same number of observations and trajectory seg-ments of the same length. The resulting means anddiffusivities are not uniformly spaced but have muchmore uniform statistics.

In terms of the mean velocities, clustering producesregions of smaller areal extent than binning, for com-parable mean standard errors. The bins have widelydifferent numbers of observations but the clusters havenearly the same, allowing more control of the signifi-cance. With smaller areas, the clusters are better able to

Ocean Dynamics

resolve details of the mean flow. Further, the accuracyis less dependent on the mean standard error withclusters than it is with bins.

The means are more accurate with smaller bins,despite the smaller numbers of observations. Binningwith a cell size of (2◦ × 1◦), as done previously forthe Nordic Seas (Poulain et al. 1996; Saetre 1999;Jakobsen et al. 2003), yields a smooth representationof the mean. Using smaller bins, however, increases thechances of individual bins being rejected for having toofew observations (e.g., Poulain et al. 1996; Falco et al.2000; Thompson et al. 2009). Clustering provides a wayaround this by allowing the number of observations tobe specified a priori.

Diffusivities are a more Lagrangian measure thanthe means, involving an integral along drifter paths.With bins, these segments are of varying length, whichimpacts the averages. One often finds too many shortsegments, and this leads to an underestimate of thediffusivity. With clustering, one specifies a priori howlong and how many trajectory segments are used forthe averages. The resulting diffusivity estimates exhibitless variation than with the bins and moreover are notskewed toward low values.

Of course the mean and diffusivity calculations areclosely related because the means are subtracted fromthe trajectories prior to calculating the diffusivities. Ifthe means are calculated with bins which are too large,integrals of the resulting residual velocities may notconverge (Swenson and Niiler 1996). With clusters, theareal coverage is typically less and the means applywhere the trajectories are, so the residual velocities arebetter captured.

We clustered data according to nearest-neighbor dis-tance, but other choices are also possible. One couldfor instance group data according to distance along anisopycnal or to position vis a vis topography (LaCasce2000). In addition, we treat each observation equally,but one can weight the observations, for instance withregard to errors on individual positions. Such alter-ations in the k-means algorithm are straightforward.

A related issue is that of “array bias”, in whichnonuniform deployments can produce errors in thediffusivities (Davis 1991). While this is often less ofa problem than sampling error (Poulain et al. 1996;Garraffo et al. 2001), it is nevertheless an issue with insitu data. Here too, the clustering approach is prefer-able because diffusivities are determined locally, wherethe trajectories are. We do not map onto a uniform grid,introducing variations in coverage.

However, this mapping onto an irregular grid maybe seen as a shortcoming of the clustering approach. Ifthe means and diffusivities are to be used in a model,

they must necessarily be interpolated onto a regulargrid. In the present case, this interpolation producedreasonable results (Fig. 3g–i) because the data coveragewas uniform (Fig. 2). But this is not usually the casewith in situ sets. Nevertheless, the procedure of map-ping the nonuniform cluster averages onto a regulargrid reminds the user of where the data actually is. Withbinned estimates, this can be less obvious.

A reviewer pointed out that we have avoided thequestion of time dependency in the mean flow. Indeed,the diffusivity is proportional to the lowest frequencyin the Lagrangian spectrum (e.g. LaCasce 2008), andthe mean velocity is ideally the component with zerofrequency. In regions with pronounced seasonal and/orinterannual variability, it is common to segregate thedata into climatological groups of several months oryears, often combined with filtering in the frequencydomain (e.g., Swenson and Niiler 1996; Jakobsen et al.2003; Sallee et al. 2008). More sophisticated techniqueshave also been proposed (e.g., Lumpkin 2003). Suchprocessing would in any case be done prior to theproposed clustering, which is really a segregation inspace.

In a coming study, we apply the clustering method todrifter data from the Nordic Seas. Preliminary calcula-tions suggest that clustering yields a similarly improvedrepresentation of the mean flow and the diffusivities.The primary challenge with the in situ data, in com-parison with the present stochastic set, is that the eddyfield is also strongly inhomogeneous. So more care isrequired.

Acknowledgements The work is part of the Poleward pro-ject, funded by the Norwegian Research Council Nork-lima program (grant number 178559/S30). Details are foundon http://www.iaoos.no/ and http://folk.uio.no/ingako/my_files/POLEWARD_WEBPAGE_MAIN.html. Harald Engedahl pro-vided the MIPOM velocities. We appreciate useful commentsfrom two anonymous reviewers.

Appendix: The clustering algorithm

We base our clustering procedure on a generalizedversion of the Llloyd’s (1982) algorithm for the problemdescribed by Eq. 5. However, contrary to conventionalapplications of k-means (MacKay 2003), in our prob-lem, the number of clusters k does not need to beguessed at, but it is deduced from the total amount ofdata to match the desired number of cluster membersm. Hence, we have developed here a procedure to par-tition the data into clusters with the number of mem-bers being as close as possible to a prescribed valuem. This heuristic numerical solution is possibly not an

Ocean Dynamics

optimal one, but it performed well for the purposeof this study. The implementation is done with theMATLAB k-means toolbox, modified accordingly. Thesteps of the algorithm are as follows:

• Choose the desired number of members in a clus-ter, m

• Given the total number of independent observa-tions n and m, compute the target number of clus-ters, k=n/m

• Start k-means procedure (“batch phase”)

– A random set of k clusters is randomly seeded– Assign each point to the nearest cluster center

minimizing the squared Euclidean distance ingeographical coordinates (Eq. 5)

– Recompute the new cluster centers– The two previous steps continues until the con-

vergence criterion is met (the assignment hasnot changed or maximum number of iterationsis reached, set to be 200 here)

– The four previous steps are repeated 100 times(for 100 initial seedings, or “replicates”) andthe “best solution” (global minimum, that is,the lowest value of the sum of within-clusterdistances, summed over all clusters) is theoutput

• End k-means procedure• Clusters with the desired number of members

are removed from consideration and stored, whilethe entire clustering procedure is repeated on thesmaller data set. The process continues until allthe data are grouped in clusters which satisfy m ∈(m − 5, m + 5), or until maximum number of iter-ations, 400, is reached. The requirement was notmet in some subsets, which considered typicallyclusters peripheral to the data-covered area. Thesewere still included in the further analysis mak-ing the distribution curves in Fig. 4b differ fromdelta-functions.

Large number of iterations and the requirement ofuniform splitting of the data makes the analysis compu-tationally intensive. For that reason, we do not performa check for a “local minimum” (in terms of Eq. 5) by aseries of reassignments of the points between clusters.Nevertheless, we found that repeated runs of the entireprocedure described above led merely to a slightlydifferent arrangement of clusters, while the reportedresults from the Z -test (Fig. 5) changed only within±2%.

The running time of the entire procedure was ca. 6 hon x86_64 GNU/Linux machine with 32 GB RAM.

References

Bauer S, Swenson MS, Griffa A, Mariano AJ, Owens K (1998)Eddy mean flow decomposition and eddy diffusivity es-timates in the tropical Pacific Ocean. J Geophys Res103(C13):30855–30871

Bauer S, Swenson MS, Griffa A (2002) Eddy mean flowdecomposition and eddy diffusivity estimates in the tropicalPacific Ocean: 2. Results. J Geophys Res 107(C10):3154

Brink KH, Breadsley RC, Paduan J, Limeburner R, Caruso M,Sires JG (2000) A view of the 1993–1994 California Currentbased on surface drifters, floats, and remotely sensed data.J Geophys Res 105(C4):8575–8604

Colin de Verdiere A (1983) Lagrangian eddy statistics from sur-face drifters in the eastern North Atlantic. J Mar Res 41:375–398

Davis RE (1991) Observing the general circulation with floats.Deep-Sea Res Suppl 38:S531–S571

Davis RE (1998) Preliminary results from directly measuringmid-depth circulation in the Tropical and South Pacific.J Geophys Res 103:24619–24639

Falco P, Griffa A, Poulain P-M, Zambianchi E (2000) Transportproperties in the Adriatic Sea as deduced from drifter data.J Phys Oceanogr 30:2055–2071

Fratantoni DM (2001) North Atlantic surface circulation duringthe 1990’s observed with satellite-tracked drifters. J Geo-phys Res 106(C10):22067–22093

Garraffo Z, Griffa A, Mariano AJ, Chassignet EP (2001)Lagrangian data in a high-resolution numerical simulationof the North Atlantic II. On the pseudo-Eulerian averagingof Lagrangian data. J Mar Syst 29:177–200

Griffa A (1996) Applications of stochastic particle models tooceanographical problems. In: Adler R, Muller P, RozovskiiB (eds) Stochastic modelling in physical oceanography.Birkhauser, Boston, pp 114–140

Jakobsen PK, Ribergaard MH, Quadfasel D, Schmith T, HughesCW (2003) Near-surface circulation in the northern NorthAtlantic as inferred from Lagrangian drifters: variabilityfrom the mesoscale to interannual. J Geophys Res 108(C5):3251

Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silver-man R, Wu AY (2002) An efficient k-means clusteringalgorithm: analysis and implementation. IEEE Trans Pat-tern Anal Mach Intell 24(7):881–892

Koszalka I, LaCasce JH, Orvik KA (2009) Relative dispersion inthe Nordic Seas. J Mar Res 67:411–433

LaCasce J (2005) Statistics of low frequency currents over thewestern Norwegian shelf and slope I: current meters. OceanModel 55:213–221

LaCasce J (2008) Statistics from Lagrangian observations. ProgOceanogr 77(1):1–29

LaCasce J, Engedahl H (2005) Statistics of low frequency cur-rents over the western Norwegian shelf and slope II: model.Ocean Model 55:222–237

LaCasce JH (2000) Floats and f/H. J Mar Res 58:61–95Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans

Inf Theory 28(2):129–137Lumpkin R (2003) Decomposition of surface drifter observa-

tions in the Atlantic Ocean. Geophys Res Lett 30(14):1753

Lumpkin R, Flament P (2001) Lagrangian statistics in the centralNorth Pacific. J Mar Syst 29:141–155

Lumpkin R, Garraffo Z (2005) Evaluating the decompositionof Tropical Atlantic drifter observations. J Phys Oceanogr22:1403–1415

Ocean Dynamics

Lumpkin R, Treguier A-M, Speer K (2002) Lagrangian eddyscales in the Northern Atlantic Ocean. J Phys Oceanogr32:2425–2440

MacKay DJC (2003) Information theory, inference, and learningalgorithms. Cambridge University Press, Cambridge

Mariano A, Ryan E (2007) Lagrangian analysis and predic-tion of coastal and ocean dynamics (LAPCOD review). InGriffa A, Kirwan AD, Mariano AJ, Ozgokmen T, RossbyT (eds) Lagrangian analysis and prediction of coastal andocean dynamics, Chapter 13. Cambridge University Press,Cambridge, pp 423–467

Orvik KA, Niiler P (2002) Major pathways of Atlantic Water inthe northern North Atlantic and Nordic Seas towards Arctic.Geophys Res Lett 29(19):1896

Owens WB (1991) A statistical description of the mean circula-tion and eddy variability in the northwestern North Atlanticusing SOFAR floats. Prog Oceanogr 28:257–303

Poulain P-M (2001) Adriatic Sea surface circulation as derivedfrom drifter data between 1990 and 1999. J Mar Syst 29:3–32

Poulain P-M, Warn-Varnas A, Niiler PP (1996) Near-surfacecirculation of the Nordic Seas as measured by Lagrangiandrifters. J Geophys Res 101:18237–18258

Rossby HT, Riser SC, Mariano AJ (1983) The western NorthAtlantic—a Lagrangian viewpoint. In: Robinson AR (ed)Eddies in marine science. Springer, Heidelberg, pp 66–91

Rupolo V (2007) Observing turbulence regimes and Lagrangiandispersal properties in the oceans. In Griffa A, Kirwan

AD, Mariano AJ, Ozgokmen T, Rossby T (eds) Lagrangiananalysis and prediction of coastal and ocean dynamics, Chap-ter 9. Cambridge University Press, Cambridge, pp 231–274

Saetre R (1999) Features of the central Norwegian shelf circula-tion. Cont Shelf Res 19:1809–1831

Sallee JB, Speer K, Morrow R, Lumpkin R (2008) An estimate ofLagrangian eddy statistics and diffusion in the mixed layer ofthe Southern Ocean. J Mar Res 66:441–463

Skagseth Ø, Orvik KA (2002) Identifying fluctuations in theNorwegian Atlantic Slope Current by means of empiricalorthogonal functions. Cont Shelf Res 22:547–563

Swenson MS, Niiler PP (1996) Statistical analysis of the sur-face circulation of the California Current. J Geophys Res101(C10):22631–22645

Taylor GI (1921) Diffusion by continuous movements. Proc LondMath Soc 20:196–212

Thompson A, Heywood KJ, Thorpe SE, Renner AH, TrasvinaA (2009) Surface circulation at the tip of the AntarcticPeninsula from drifters. J Phys Oceanogr 39:3–25

Veneziani M, Griffa A, Reynolds AM, Mariano AJ (2004)Oceanic turbulence and stochastic models from subsurfaceLagrangian data for the Northwest Atlantic Ocean. J PhysOceanogr 34:1884–1906

Zhurbas V, Oh IS (2003) Lateral diffusivity and Lagrangianscales in the Pacific Ocean as derived from drifter data.J Geophys Res 108(C5):3141