On Solving the Problem of Textual Contamination
-
Upload
uni-leipzig -
Category
Documents
-
view
1 -
download
0
Transcript of On Solving the Problem of Textual Contamination
1
Paper presented at the Workshop “Information Technologies and Innovation in Sanskrit Based
Indian Studies” held the University of Vienna, Department for South Asian, Tibetan and Buddhist
Studies on 26 March 2011.
On solving the problem of textual contamination by means of
computer aided stemmatics
Philipp Maas, University of Vienna
1) What is “Computer aided stemmatics”?
Computer aided stemmatics is a method to build reliable hypotheses of
transmission histories mainly by comparing different text versions.
The resulting hypothesis is then used as a guideline to establish the earliest
reconstructable version of this work (called archetype in stemmatics) and to
get an idea of how (and why) the literary work changed in the course of
transmission from the oldest reconstructable text version into several
different versions.
Computer aided stemmatics combines two complementary approaches: first,
the use of cladistic software to calculate and analyze cladograms, i.e. trees
having an internested branching structure. Cladograms serve as rough
estimates of the transmission history. The second approach is the philo-
logical discussion of variant readings, which basically consist of checking
whether the computer generated structure of a tree is acceptable from a
philological-historical perspective. If philological-historical arguments con-
tradict the cladistic hypothesis, the hypothesis has to be modified accor-
dingly in order to transform the cladogram into a stemma. Modifications
may concern the structure of the tree as well as decisions regarding the over-
all utility of witnesses for building a stemma.
2
1) An introduction to the basic principles of cladistics
Cladistics is a method which is used in different academic fields like
archaeology, linguistics, cultural sciences, in different philologies and, of
course, in its homeland of evolutionary biology. All these fields share the
aim to build hypotheses of the historical development of objects under
investigation (like artefacts, languages, rituals, recipes, literary works, and
biological species) from a recording of identical and different characteristics
of these objects. In cladistics, the recording of characteristics is transformed
into a graphic representation of the historical development in the from of a
branching diagram, a tree. A tree should account for the distribution of
characters (or variants) among the taxa (or manuscripts) under investigation.
In choosing the tree which fits the data best, the so-called parsimony
principle is used. This principle – frequently referred to as Occam’s razor –
is based on the assumption that if there are several alternative solutions for a
scientific problem, the most economical – or parsimonious – solution is to
be preferred as long as it is not contradicted by other known facts.
Parsimony translates into textual cladistics as follows.
Different versions of a text differ from each other in their variants. If two or
more textual witnesses share the same variant as against all other witnesses,
there are basically two possible explanations. Either one and the same
variant occurred several times in the history of transmission or the variant
occurred only once and was subsequently copied. The second explanation,
the more parsimonious one, is the one to be preferred under normal
conditions.
3
2) The application of cladistics under favourable conditions
Fig. 1: Data matrix created from ten instances of variation in six manuscripts
This data matrix records the variants occurring at ten exemplary instances of
variation for six manuscripts from a collation of the CS Vimānasthāna 8.
Within each column identical numerical symbols represent identical
readings in the manuscripts. For example, in column no. 1, number 0 is
recorded for manuscripts Ad, Chd, P1s, which means that they share a
common reading exclusively as against all other manuscripts. For manu-
scripts Ap1 and Ba1 number 2 is recorded, which means that these two
witnesses share a different reading; and finally the symbol 4 for B1d
indicates that this witness has a peculiar reading not shared by one of the
other manuscript. The choice of the numerical symbols is arbitrary, but it is
important to record variants in such a manner that identical symbols
represent identical readings.
In order to grant explanatory power to the present example, it is necessary
that we take each place of a variation to represent a class of similar cases,
which would occur in a recording of a not too short text passage.
Parsimony software is used to find the best, i.e. most parsimonious, repre-
sentation of the data in form of a bifurcate tree. The most parsimonious re-
presentation is the one that explains the distribution of variants among the
textual witnesses by assuming the lowest degree of textual changes along the
branch lines. It therefore amounts to assuming textual changes only in
instance, where the data provide evidence for such a change.
4
Fig. 2: Unrooted tree calculated from ten unordered and equally weighted variants. CI 1.0.
I fed the just mentioned data matrix to the cladistic computer program
PAUP* and applied the exhaustive search option under parsimony criteria.
The program found very quickly one most parsimonious unrooted tree out of
all together 105 possibilities. All recorded variants support this tree, which
means that there is a perfect consistency in the data. The degree of con-
sistency between data and tree can be measured and translated into the so-
called consistency index. In our case, where there is perfect consistency, the
consistency index has the numerical value of 1.0. A lower degree of
consistency would, of course, result in a lower value of the consistency
index.
In the present case, the high consistency between data and tree is not a big
surprise, since we are dealing here not with a real textual transmission, but
with exemplary variants selected with a view to demonstrate how cladistic
software works under favourable conditions and how the results of cladistics
compare to the results of classical stemmatics.
In order to do so, we first have to root the up to now un-rooted tree, since an
un-rooted tree cannot serve as a hypothesis on the development of a text in
time. Rooting does not affect the structure of the tree. All the lines that make
up the un-rooted tree remain unchanged. Rooting is nothing more than
5
identifying the particular point on a tree which deserves the apex position,
and then pulling this point upwards, which leaves the lines of the tree
hanging down. The position of the root in a cladistic tree corresponds to the
position of the archetype, the oldest reconstructable witness, in a stemma.
There is no way in text genealogy to identify the root in a cladogram by
mere numerical calculations. At least one variant which is exclusively trans-
mitted by a single group of manuscripts and which can confidently be
judged as being original has to be identified on the basis of philological con-
siderations. If the same group of manuscripts also contains at least one clear
variant of secondary origin, this group must go back to one of the hyp-
archetypes. All the other available witnesses accordingly go back to the
other hyparchetype, so that the archetype can be located at that part of an
unrooted tree which connects the hyparchetypes.
Fig. 3: Variant no. 2 traced along the tree
In the present example, manuscripts Ad Chd P1s transmit in variant no. 2 a
reading of secondary origin, where as Ap1 B1d Bad preserve at the same
instance the genuine reading. It may be noted that from a cladistic point of
view it is unclear whether the one or the other reading is original.
6
Fig. 4: Variant no. 8 traced along the tree
In variant no. 8, the opposite is the case. Ap1 B1d Bad transmit a reading of
secondary origin, whereas Ad P1s (and Chd) preserve the more original one.
According to the interpretation of these two readings, the root of the tree can
be located along the branch which connects the two groups of manuscripts
Ad Chd P1s and Ap1 B1d Bad.
As the result of the cladistic analysis we now have a hypothesis of the
development of the CS in time, which is based on a numerical calculation
and on the philological judgement of two variant readings.
But what is the relationship of this hypothesis to the remaining eight variants
when judged from a philological perspective? How would the result
achieved so far compare to a classical stemma, built by applying the princi-
ples of textual criticism as formulated by Paul Maas?
3) A methodological comparison of cladistics with classical stemmatics
The existence of peculiar readings in all manuscripts indicates that none of
the available witnesses was the immediate exemplar of one of the others.
The position of the taxa at the terminal points of branches in the cladogram
7
corresponds to the position they would have in a classical stemma at the end
of each line of transmission.
Fig. 5: Variant no. 5 traced along the tree
Moreover, classical stemmatics would ascribe a common exemplar to Chd
and P1s due to the connective error they share exclusively in variant no. 5.
And the same holds true for Ap1d B1d, which share a connective error in
variant no. 4.
Fig. 6: Variant no. 4 traced along the tree
From a stemmatic perspective, however, this reading is difficult to judge,
since an editor would have to know the temporal relationship between the
8
variants sa�darśitāni and darśitāni a priori in order to judge at the omission
of the preverb sa� as a connective error. In cladistics, such a judgement is
not necessary. On the backdrop of all available data, the individual reading
darśitāni can only be taken to be a corruption of sa�darśitāni.
The different interpretations of this variant in cladistics and stemmatics hint
at a fundamental difference between these two methods. Whereas classical
stemmatics is based on the judgement of individual readings, cladistics
involves considerations of the logical structure of the whole tree and on the
interrelationship of individual variants.
Under favourable conditions, i.e. in a transmission in which there is perfect
consistency between data and tree, cladistics produces the same result as
classical stemmatics at its best, but cladistics achieves this result much
faster. Cladistics is therefore superior to classical stemmatics in terms of
efficiency. Moreover, it is superior in terms of reliability, since in classical
stemmatics a misjudgement of the temporal relationship of alternative
variants may lead either to misjudging the data as partly contradictory or to
ascribing a wrong stemmatic position to one or more witnesses. This is ruled
out in cladistics. And finally, cladistics is to be preferred over classical
stemmatics also on the basis of methodological considerations. Its
application requires much less back-ground assumptions on the temporal
relationship of alternative readings than stemmatics. Whereas cladistics
requires just a small number of philological judgements in order to root a
previously established tree, classical stemmatics requires the philological
judgement of variants for each and every step of tree building.
4) Contamination in classical stemmatics
It may seem, however, that in demonstrating the superiority of cladistics
over stemmatics I am beating a dead horse, since stemmatics is nowadays
virtually out of use. The reason for this is the fact the classical stemmatics
9
simply does not work for real existing textual traditions of Sanskrit (and
other) works, since in real transmissions the occurrence of contamination
(i.e. mixing of two or more text versions into a single copy) is to be
assumed. In the following part of my presentation, I would therefore like to
show how cladistics works in dealing with contaminated transmissions.
5) Contamination in cladistics
In order to discuss the capability of cladistics in a contaminated trans-
mission, I recorded the variants of witness J1d to our data matrix. Manu-
script J1d was identified as being strongly contaminated in a previous study.
The cladistic software *PAUP calculates again one most parsimonious tree
from the data. This time, however, not all data is in harmony with the tree
structure. There is a logical conflict between different possibilities of
mapping the data on the tree. This conflict shows itself in a consistency
index of ca. 0.96 which is lower than the optimum amount of 1.0.
Fig. 7: Most parsimonious tree containing J1d. Consistency Index ca. 0.92
After rooting, the tree shape is almost identical to that of the previously
discussed tree. The only difference is that the newly added manuscript J1d
received the prominent position of being a sister of the youngest common
exemplar of Ad Chd and P1s. If this hypothesis is accepted, the readings of
10
J1d are decisive for a reconstruction of the archetype. All cases, in which
J1d reads together with two witnesses out of the family Ap1d, B1d and Ba1d
would then have to be considered as being of archetypal origin. But is this
hypothesis reliable?
From a cladistic point of view the hypothesis is simply the best one can
achieve, since the tree in its present shape corresponds to the most
parsimonious resolution of the data. From a historical-philological per-
spective, we are, however, obliged to question the cladistic hypothesis. The
clue to judge the reliability of the cladistic hypothesis is a discussion of
variant no 7, in which the data does not fit the tree.
Fig. 8: Variant no. 7 traced along the tree
The problem with this variant is that it cannot be traced consistently along
the present tree, since J1d and P1s share the common reading akleśam
asai��utva� as against akleśasahi��utva� in Ad and Chd. If we are willing
to accept the structure of the tree, we have to accept that an identical textual
change occurred in J1d and P1s independently. This explanation is quite
unconvincing, especially if we take the present variant not as an individual
case but as representing a whole class of variants. It is therefore necessary to
take an alternative explanation into consideration and to change the tree
structure in such a way that variant no. 7 indicates that J1d and P1s are in
fact sister manuscripts.
11
In the modified tree, the consistency of the data drops to a value of 0.92,
since now two variants do not show a most parsimonious resolution along
the tree.
Fig 9: Variant no. 2 traced along an alternative tree
The first inconsistent variant is no. 2, in which, as we have already seen,
manuscripts Ad Chd and P1s share the common error kā as against kārya in
all other witnesses.
Fig. 10: Variant no. 5 traced along an alternative tree
The second case is quite similar. Here manuscripts Chd and P1s share the
common writing error yathā as against the correct reading yasya in all other
manuscripts. J1d, on the other hand, does not transmit this error.
These two inconsistencies can easily be explained – without taking recourse
to improbable likelihoods – as the result of corrections. In all cases in which
12
J1d is free from variants of secondary origin transmitted in P1s, its readings
agree with the readings we find within the group Ap1d, B1d and Ba1d.
Therefore it is quite obvious that the scribe of J1d must have had access not
only to an exemplar which is closely related to P1s, but also to a second one,
which derived from the common archetype of Ap1d, B1d and Ba1d. In other
words, J1d can be considered a contaminated copy. Its true genealogical
position therefore is not close to the archetype as was indicated by cladistics,
but in close proximity to P1s.
6) Methodological consequences
First of all, it should have become clear that cladistics as such does not
provide a solution for the problem of contamination. Whenever two or more
exemplars were mixed systematically in order to improve the readings of the
main exemplar in the copy, cladistics will inevitably ascribe a too early
position to this contaminated witness. On the other hand, cladistic software
provides the opportunity to investigate the structure of cladistic trees and to
judge the informative value of individual variants.
The key to spot contaminated witnesses is the following consideration:
Unobjectionable readings pass frequently from a secondary exemplar into a
copy, since clear writing errors will usually be corrected as soon as they are
detected. Therefore, a small number of shared writing errors provides more
evidence for a close stemmatic relationship of witnesses than a large number
of meaningful readings. Based on this assumption it is possible to identify
contaminated manuscripts, to trace their main exemplar, to determine to
region of a stemma from which they received contaminational readings, and
to decide which manuscripts are decisive for the reconstruction of the
archetype.
The use of cladistic software therefore has a large potential to cushion or
even solve the long standing problem of contamination in stemmatics.