Provenancing Metal Artifacts and Multivariate Statistics

13
PROVENANCING METAL ARTIFACTS AND MULTIVARIATE STATISTICS M.J. Baxter, Division of Physics and Mathematical Sciences, Nottingham Trent University, U.K. (currently Emeritus Professor of Statistical Archaeology). R.G.V. Hancock, Department of Medical Physics and Applied Radiation Sciences, McMaster University, Hamilton, Ontario, Canada. Abstract: The paper can be thought of as the outcome of a dialog between MJB and RGVH that took place in the mid-2000s, which resulted in two poster presentations at a conference, the details of which I regret I have forgotten. I think it was Quebec, but I didn’t attend. Other than modifications to enhance continuity the text is left as it originally was, with the two posters being merged. Something that emerged in the collaboration was what migh be termed the merits of a ‘bivariate-splitting approach to data analysis for problems of this kind. Some might regard this as preferable to, and more comprehensible than, what they perceive as ‘complex’ multivariate analysis. Hancock et al. (2008, Archaeometry 50, 710-726) is an interesting exercise in applying bivariate- splitting to a large and complex ceramic compositional data set. PART 1- METHODOLOGY Introduction Many of the European-made cauldrons that were traded to North America were cut up at some point in their life to make other objects - early North American recycling? It appears that pieces of metal may have had enough value to be traded/gifted to people on neighbouring villages. So, along with the acquisition of whole kettles, this possible scrap metal trading means that at any one site archaeologists tend to find artifacts made of many different metal chemistries. This is not like the world of ceramics where the tendency is only to find a few different 1

Transcript of Provenancing Metal Artifacts and Multivariate Statistics

PROVENANCING METAL ARTIFACTS AND MULTIVARIATESTATISTICS

M.J. Baxter, Division of Physics and Mathematical Sciences, Nottingham Trent University, U.K. (currently Emeritus Professor of Statistical Archaeology).R.G.V. Hancock, Department of Medical Physics and Applied Radiation Sciences, McMaster University, Hamilton, Ontario, Canada.

Abstract: The paper can be thought of as the outcomeof a dialog between MJB and RGVH that took place inthe mid-2000s, which resulted in two posterpresentations at a conference, the details of which Iregret I have forgotten. I think it was Quebec, but Ididn’t attend. Other than modifications to enhancecontinuity the text is left as it originally was,with the two posters being merged. Something thatemerged in the collaboration was what migh be termedthe merits of a ‘bivariate-splitting approach to dataanalysis for problems of this kind. Some might regardthis as preferable to, and more comprehensible than,what they perceive as ‘complex’ multivariateanalysis. Hancock et al. (2008, Archaeometry 50, 710-726)is an interesting exercise in applying bivariate-splitting to a large and complex ceramiccompositional data set.

PART 1- METHODOLOGY

IntroductionMany of the European-made cauldrons that were tradedto North America were cut up at some point in theirlife to make other objects - early North Americanrecycling? It appears that pieces of metal may havehad enough value to be traded/gifted to people onneighbouring villages. So, along with the acquisitionof whole kettles, this possible scrap metal tradingmeans that at any one site archaeologists tend tofind artifacts made of many different metalchemistries. This is not like the world of ceramicswhere the tendency is only to find a few different

1

chemistries, unless one is dealing with a site on amajor trading route. After thinking for a few years about the vagaries ofelemental concentration data interpretation, RGVHassembled a small data set for the copper-brasssystem - 10 elements and 200 samples. This data setis made up of a number of discrete chemical groupingsof each material type.Archaeologically, the data set was designed toinclude samples of native coppers (usually low intrace elements) that were used in North America (NA)before Europeans arrived, and 17th century Europeantrade goods made of copper and two different brasses(the latest to arrive in NA). The samples were divided between five supposedarchaeological village sites (A-E) to give studentsan idea as to how data analysis andinterpretation may lead to conclusions of inter-siteconnections and relative periods of villagehabitation.The data include well and poorly measurable elementalconcentrations. The smelted materials are like coarseceramics, not very homogeneous. The missing majorelements are copper and lead. The material types are

Native coppers 19 samplesEuropean coppers 37 samplesRed brasses 55 samplesYellow brasses 87 samplesUnknown 2 samplesRGVH invited several researchers to investigate thedata, with a view to seeing how successful differentstatistical methods were at recovering the structureknown to be in the data. This part reports on MJB’sanalysis, dealing with methodological questions, andthe second part with the application and RGVH’sassessment of the merits of the analysis.

Statistical ideas 1Distance – A main idea is that we are trying toidentify distinct chemical compositional groups inthe p-dimensional space defined by the variables we

2

choose to use. A fundamental principle is that if wecan find a view of the data that separates cases orgroups, in terms of the distance between them, we canbe assured that they are chemically distinct withrespect to the variables used.We can search for distinct groups, or cases, usingunivariate and bivariate graphical methods, includingplots derived from principal components analysis(PCA). The upper plot below shows, for example, thatthere are at least three groups in the data definedby the amount of Zn present. The lower plot suggeststhat the two groups associated with the higher levelscan be sub-divided by the levels of Sn present.

3

Zn302520151050

Dotplot of Zn

Each sym bol represents up to 2 observations.

Zn

Sn

35302520151050

50000

40000

30000

20000

10000

0

Group1123

Scatterplot of Sn vs Zn

On the basis of the first plot (a dot-plot) adecision was made to separate out three main groupson the basis of Zn levels. A way of searching forfurther structure, using bivariate plots is via apairs or matrix plot, illustrated below for the low-Zn group. It is obvious that there are sub-groups inthe low-Zn group.

The technique of ‘brushing’ is illustrated here. Inthe plot of Au against Sb a subset of high-Sb caseshas been highlighted and this plots coherentlyelsewhere. It is a compositionally distinct group.The plot below highlights a low-Au/low-Sb group. Thisseparates out on other plots so can be sub-divided.

4

Statistical ideas 2Peeling – An idea that can be applied to any bivariateanalysis of data (including the use of plots based onPCA) is what we have termed ‘peeling’ the data.Bivariate plots will reveal the more obviousstructure in a data set, and their scale – and hencewhat can be seen – is often determined by veryobvious group separation and/or the presence ofoutliers. By ‘stripping-out’ outliers and obviousgroups for separate study less obvious structure canbecome apparent.This has already been shown in the plots presented.It is obvious in the pairs plot that what weoriginally identified as a low-Zn group, when studiedin isolation, shows further sub-groups based on thelevel of Zn.Although we do not illustrate the idea in detailhere, ‘peeling’ is a very useful way of using PCA.PCA plots based on the first two or three componentsare often rather good at showing obvious groups andoutliers in the data. Structure that may be apparentin higher dimensions is obscured. As the more obviousfeatures are ‘peeled’ from the data other structure‘surfaces to the top’ and becomes apparent in thefirst two or three dimensions.For reasons now to be discussed PCA has been used inthis analysis mainly to check on the integrity ofgroups derived by other means. The value of ‘peeling’a PCA is, however, worth noting. It can work well

5

with data sets that are either quite small; haveunusual structure; or contain lots of outliers, forwhich more ‘sophisticated’ grouping methods assumingmultivariate normality and using Mahalanobis distancemay not be suitable.

Statistical ideas 3Variable selection – When originally presented with thedata and problem that forms the basis for this workMJB explored the data using PCA (as well as moreelementary methods).There is strong structure in the data that PCArevealed, however there was a problem. This was thatas well as multi-modal (group) structure evidenced byZn, other variables such as As and Au also showedmulti-modal patterns that ‘cut-across’ those shown byZn.In PCA the variables are treated ‘equally’. Aconsequence of this is that some groups suggested byPCA contained cases with different levels of Zn. Ineffect the structure present in other variables tookprecedence over that shown by Zn. Is this reasonable?MJB, knowing little about metals, but having spokento someone who did, decided not.The separation into three groups based on Zn israther obvious and it was decided to ‘peel’ the dataon this basis, in the first instance. Similarly,within some of the Zn groups, there were obvious sub-groups based on the levels of Sn, and these were alsopeeled out for separate consideration.Further analysis proceeded using a mixture of thiskind of approach and PCA.What is happening here is that some variables – Znand then Sn – were ‘privileged’ in relation to othersin the initial stages of group formation, with PCArelegated to the role of checking results.In effect, variable selection is being used here toinform the grouping process. This is not easilyembedded within many multivariate statisticalprocedures used in archaeometry.

6

Statistical ideas 4Checking results - What we are doing here mimics theprocedure known as hierarchical divisive clustering, but isinformed by intelligent selection of splittingvariables, rather than an automatic algorithm.Results need to be checked to ensure (a) that the defined groups are indeed distinct fromeach other;(b) that the groups defined cannot be further sub-divided.Multiple box-plots and PCA were the main tools used.Their application will be illustrated with referenceto the low-zinc group, which was divided into sixsubgroups as follows.Group1 High Sb2a Low Sb High Au V. Low Zn2b Low Sb High Au Low Zn3 Low Sb Low Au High Sn4a Low Sb Low Au Low Sn Low As4b Low Sb Low Au Low Sn High AsTerms such as ‘high’ and ‘low’ are used in a relativesense – what is termed ‘high’ here may not be so withrespect to measurements in the other Zn groups. Here the low-Zn group has been subdivided initiallyon the basis of Sb values, then on Au. Thereafter Sn,As and a further sub-division based on Zn levels havebeen used.These made sense graphically; whether they makesubstantive sense is not something that MJB iscompetent to comment on.

A multiple boxplot of Au for the defined groups isshown. We can show that1 separates from all other subgroups by virtue of higher levels of Sb, medium Au2a separates on higher Au, Co, In

7

2b separates on higher Zn; also no Au, In overlap (highish)3 separates from everything but 1 by virtue of higherlevels of Sn4b separates by virtue of high As and separates from 4a on Ag

A multiple boxplot of Au for the defined groups isshown. We can show that1 separates from all other subgroups by virtue of higher levels of Sb, medium Au2a separates on higher Au, Co, In2b separates on higher Zn; also no Au, In overlap (highish)3 separates from everything but 1 by virtue of higherlevels of Sn4b separates by virtue of high As and separates from 4a on AgAg separates out 2a and 2b from the rest.The PCA plot below confirms the separation. Two group3 samples are isolated from the rest. They differwith respect to low levels of Ni. PCA can be used inthis way to check the integrity of proposed grouping,and suggests that here we could split 3 into 3a and3b. Further discussion and conclusions are providedin Part 2 of the paper.

8

PART 2 – APPLICATION

IntroductionThe background to the analysis is given above. Thisdealt with the archaeological background and thechallenge posed by RGVH. In the course of outliningthe approach to statistical analysis adopted by MJB,some analyses of what was termed a low-Zn group wereundertaken for illustrative purposes. Here theanalysis of the data is continued. Some practicalitiesIn the analysis of multivariate archaeometric data,publications sometimes give the impression eitherthat the defaults in the available statisticalsoftware have been used, or that procedures that havebeen established within a particular laboratory (orlaboratories) over time are (more-or-less) applied ina routine fashion.This is not (necessarily) a criticism. Some suchprocedures have been developed with a view to thecharacteristics of the material analysed, and have along and honourable track record. Basing analyses onthe logarithms (to base 10) of the data is standardin some approaches though, interestingly, it oftendoes not make much difference to the conclusions.

9

The first thing to do is to look at the data. Themost obvious characteristic was that severalvariables were multi-modal.Some variables had unimodal, long-taileddistributions, suggesting that logarithmictransformation might be advantageous. Looking at theoverall pattern, however, it was decided to operateon the original scale of the data, standardizing thisfor the purpose of multivariate analysis.Dealing with measurements below the level ofdetection is also an issue. Here the rathersimplistic solution of taking values recorded as < xas x was used. More subtle, principled and/or complexapproaches exist, but it seemed to work well enoughhere.This is more of an issue if using log-transformeddata, since you cannot take logarithms of zerovalues, and the zero (i.e. below level of detection)replacement strategy can suggest spurious structurein the data.Further analysesMedium-ZnThe medium-Zn group was split into four subgroups.The evidence for these groups is less compelling thanfor the low-Zn group discussed in the companionposter. The figure below shows a pairs plot forselected variables for these groups, 5a-5d.

Multiple boxplots show that as we move from 5a to 5d,Zn levels get lower but overlap remains.

10

As we move from 5a to 5d, Sn levels get higher and 5aseparates out.5d separates out by virtue of higher Ag, As, Sb. 5c separates out on Ag, As. The groups can thus be separated out with respect toseveral variables, but the possibility exists thatwhat we are really doing is dividing a continuum(High Zn; Low Sn, Ag, As) to (Low Zn, High Sn, Ag,As).Groups 5a and 5b are not separated on the plot(separation is better but not marked on other PCplots) and it is only the level of Sn that separatesthem.High-ZnThe high-Zn group was initially divided into threesub-groups on the basis of levels of Sn, which had aclearly multi-modal distribution within the group.None of these sub-groups were compositionallycoherent and identifying ‘splitting’ variables provedtrickier than for the other two groups. The approacheventually adopted was to use brushing techniques onpairs plots to identify (apparently) compositionallycoherent sets of samples. This resulted in 10 sub-groups in all, which plot, for the most part, ratherwell and coherently on the plot of the first twoprincipal components that follows.It is possible to demonstrate that each of theproposed groups can be separated from each othergroup with respect to at least one variable. This isnot simply demonstrated and not attempted here.

11

The plot also suggests that some of the groups (e.g.,7d) can be further sub-divided. This proves to be thecase, but is not pursued further here.Groups 6, 7 and 8 are distinguished by low, mediumand high levels of Sn. Group 7d has lower levels ofSn than any of 7a-c. To demonstrate that groups can be separated by themeasured variables, it suffices to show that this canbe done within 6, 7 and 8. The results of examiningthis are given oppositeTo summarise, it is possible to separate out all thegroups from each other. The proposed groups actuallyshow quite well on a plot of the first two PCs andwould mostly be picked out without the labellinginformation. There is a case for further sub-dividing7d, both in the plot shown and that on PC2 vs PC3(not shown) which separates out samples 33, 34, 112,115 and 125 (call this 7d1). On this and othercomponent plots, 6c seems to plot more closely to asubset of 7d than it does to other 6s.Ag – separates 7a from everything and 8b from 8a and 8c.As – separates the 8s, and 7a and 7b from 7c and 7d.Au – separates the 6s and 8s and all but 7c and 7d inthe 7sCo – shows good separation within groups 6 and 8; 7a and 7b are separated apart from one sample.In – effects reasonable separation within most of 6, 7, 8. 7c is lower than 7a and 7b; 7a tends to be higher than 7b, but there is overlap.Mn - 8b and 8c are separated from everything but 6c and 7d by lower levelsNi – 8a has higher values than everything; 6a (higher) is separated from 6b and 6cSb – separates 6a, 6b and 6c (high, medium, low); and8b from 8a and 8c (higher).Zn – separates 8c (higher) from 8a and 8b; group 6 has the higher levels.

12

DiscussionThe data included the following material types withinwhich there were sub-groups.Native coppers 19 samplesEuropean coppers 37 samplesRed brasses 55 samplesYellow brasses 87 samplesUnknown 2 samplesRGVH’s summary of the findings isNative copper: Found most of in one group, but missedtwo because of their high Sn.European copper: Found 4 groups; 1 combined group (i.e.combining some of the sub-groups).Red brass: Found 3 groups; 1 combined group.Yellow brass: Found 7 groups and combined others to form5 other groups.In all cases of combined groups there was no mixingof RGVH’s different material types - that is to say,samples from any one of the RGVH groups appeared onlyin one combined group. This means that in each case,either RGVH mis-labelled groups, or MJB did not abusethe data roundly enough.(MJB would add that that some of the analyses mighthave been taken further, but that in a substantiveanalysis this would normally be done in collaborationwith the presiding archaeologist/archaeometrician!)All in all, the five major sample types were sortedclearly, but some of the fine structure of similargroups was not unravelled. This means that the MJBdata analysis would lead to appropriate inter-sitetrading interpretations, which is the ultimate aim ofthe exercise.

 

13