Pattern recognition methods in human-assisted reproduction

15
Pattern recognition methods in human-assisted reproduction G. Patrizi a , C. Manna b , C. Moscatelli a and L. Nieddu a a Dipartimento di Statistica, Probabilita ` e Statistiche Applicate, Univerista ` degli Studi ‘La Sapienza’, Rome and b Centro di Riproduzione Umana e Terapia dell’Infertilita ` Genesis, Rome, Italy E-mail: [email protected] Received 4 June 2002; received in revised form 17 March 2003; accepted 30 June 2003 Abstract The aim of this paper is to present a method which allows to select, more accurately than it is currently done, embryos, at the desired stage of development, which will lead to births, although perhaps with some slight differences in precision, that are deemed to be important. Thus we show that embryos, pronuclei and oocytes consist at least of two types: those suitable for procreation and those not suitable and both types can be recognised easily by an automatic procedure at whatever stage is desired. From their digitilised images before transfer, specific characteristics are formulated automatically and through a particular pattern recognition algorithm the specimen is recognised as belonging to one of two groups with a high precision. The algorithm works in two stages: in the first, the images and their outcomes are used to train the algorithm and in a second stage, on the basis of the rule learnt in training, embryos, pronuclei or oocytes are classified. As multiple transfers and pregnancies occur the training must be carried out with an imprecise ‘teacher’ through a suitable algorithm which can handle this. Introduction In-vitro fertilisation is widely used to treat infertility, although the rate of success of treatment is low, with an average overall birth rate per cycle of treatment, of 13.9% (Templeton, Morris and Parslow, 1996). Various factors have been identified as affecting this rate, such as the age group of the women concerned, the duration of infertility, the usage of donor’s eggs and perhaps the treatment centre, a uterine factor and other subsidiary aspects (Templeton, Morris and Parslow, 1996). There has also been some interest and concern to detect morphological factors of the embryonic cells through the application of classification criteria (Mills, 1992). Numerous other scoring methods are available to classify the specimens at various stages of development (Scott, Alvero, Leondires and Miller, 2000; Tesarik and Greco, 1999). In some cases, through comparative statics, in lieu of a dynamical monitoring of the development, some quite good results have been Intl. Trans. in Op. Res. 11 (2004) 365–379 INTERNATIONAL TRANSACTIONS IN OPERATIONAL RESEARCH r 2004 International Federation of Operational Research Societies. Published by Blackwell Publishing Ltd.

Transcript of Pattern recognition methods in human-assisted reproduction

Pattern recognition methods in human-assisted reproduction

G. Patrizia, C. Mannab, C. Moscatellia and L. Nieddua

aDipartimento di Statistica, Probabilita e Statistiche Applicate, Univerista degli Studi ‘La Sapienza’, Rome andbCentro di Riproduzione Umana e Terapia dell’Infertilita Genesis, Rome, Italy

E-mail: [email protected]

Received 4 June 2002; received in revised form 17 March 2003; accepted 30 June 2003

Abstract

The aim of this paper is to present a method which allows to select, more accurately than it is currentlydone, embryos, at the desired stage of development, which will lead to births, although perhaps with someslight differences in precision, that are deemed to be important. Thus we show that embryos, pronuclei andoocytes consist at least of two types: those suitable for procreation and those not suitable and both typescan be recognised easily by an automatic procedure at whatever stage is desired. From their digitilisedimages before transfer, specific characteristics are formulated automatically and through a particularpattern recognition algorithm the specimen is recognised as belonging to one of two groups with a highprecision. The algorithm works in two stages: in the first, the images and their outcomes are used to trainthe algorithm and in a second stage, on the basis of the rule learnt in training, embryos, pronuclei oroocytes are classified. As multiple transfers and pregnancies occur the training must be carried out with animprecise ‘teacher’ through a suitable algorithm which can handle this.

Introduction

In-vitro fertilisation is widely used to treat infertility, although the rate of success of treatment islow, with an average overall birth rate per cycle of treatment, of 13.9% (Templeton, Morris andParslow, 1996). Various factors have been identified as affecting this rate, such as the age group ofthe women concerned, the duration of infertility, the usage of donor’s eggs and perhaps thetreatment centre, a uterine factor and other subsidiary aspects (Templeton, Morris and Parslow,1996).There has also been some interest and concern to detect morphological factors of the embryonic

cells through the application of classification criteria (Mills, 1992). Numerous other scoringmethods are available to classify the specimens at various stages of development (Scott, Alvero,Leondires and Miller, 2000; Tesarik and Greco, 1999). In some cases, through comparativestatics, in lieu of a dynamical monitoring of the development, some quite good results have been

Intl. Trans. in Op. Res. 11 (2004) 365–379

INTERNATIONAL TRANSACTIONS

IN OPERATIONALRESEARCH

r 2004 International Federation of Operational Research Societies.

Published by Blackwell Publishing Ltd.

obtained, mostly through repeated evaluation during prolonged development outside the uterus,instead of 2 days, 3 or even 5 days (see Scott et al., 2000).Delays in implanting may help the selection, by allowing a longer supervision that the

development is proceeding correctly, but may hamper its potential viability to bring forth a birth,through exposure to light and contaminants. Delay in selection will also give rise to the need todispose of the additional embryos, which are not required. Also, this disposal may be in contrastwith one’s ethical beliefs.All aspects, which help in selecting the embryos which will lead to births to be transferred,

should be pursued and many have provided social and biological characteristics of the mother assignificant factors, but here again the results seem to be inconsistent (Roux, Joanne, Agnani,Fromm, Clavequin and Bresson, 1995; Terriou, Sapin, Giorgetti, Hans, Spach and Roulier, 2001).The use of Operations Research methods for classifying specimen in two classes has been also

implemented (see Grimaldi, Manna, Nieddu, Patrizi and Simonazzi, 2002), based on preliminaryresearch on a limited sample, (Grimaldi, Manna, Nieddu, Patrizi and Simonazzi, 2000; Simonazzi,2000). The research and experimentation proposed here builds on these pioneering efforts todevelop a much fuller treatment of the recognition of the suitability of the specimens in theirprogress from unfertilised eggs to embryos.A better method for choosing suitable specimens to implant, if it can be formulated, at

whatever stage of development, will raise the overall birth rate per cycle of treatment. It will allowa marked reduction in the numbers of specimens to be transferred, over less-precise methods, toachieve a desired number of births. Obviously the actual transfer will be performed on specimensat a certain stage of development, but these could be picked out at the last possible moment, orduring any of the previous stages, or the particular choice of which specimens to transfer could bechosen dynamically, based on the classification of their suitability as they pass through every stageof development.The aim of this paper is to show that human embryos at the four-cell stage or eight-cell stage in

human-assisted reproduction can be clearly distinguished in at least two types: developingembryos before transfer (DET), which will occasion a successful pregnancy, and not developingembryos before transfer (NODET), which will abort at some stage at, or following, the transfer.Moreover this selection may be done at the oocyte, pronuclei, or embryo stage of development fortransfer at the embryo stage on the chosen day.The human-assisted reproduction process consists of a set of activities in which a specialised

medical staff cooperates with the prospective mother to give rise to a pregnancy and birth of ababy, in circumstances that would not be possible otherwise. In Italy about 26,000 couples peryear make use of such treatment. The prospective mother usually has a history of unsuccessfulefforts to bring about a pregnancy. These may arise from malformations or pathologies on herpart or on her partner’s. Thus the first duty of the medical staff is to study the problem andperform a series of tests, to determine the possibility of carrying out an assisted reproductionprocess, (Moscatelli, 2002).The first stage of the process is for the female to undergo a stimulation treatment with

hormones for the development of multiple follicles, called superovulation, to ensure that a suitablenumber of mature eggs will be available. The follicle consists of a group of cells in the ovarywithin which the egg (oocyte) is situated. When the development of the follicle is complete aspecial hormone is administered and the follicle liquid is extracted from the womb. This is termed

G. Patrizi, C. Manna, C. Moscatelli and L. Nieddu / Intl. Trans. in Op. Res. 11 (2004) 365–379366

the ‘pick-up’. Following the pick-up, the maturity of the eggs, or oocytes are examined under themicroscope. The term oocyte refers to these developing egg cells before maturation. One ofvarious techniques can be used now to achieve the fertilisation of the oocyte and the transfer intothe patients uterus to enable the oocyte to develop into an embryo and give rise eventually to abirth. Here we shall consider only the Intra Cytoplasmic Sperm Injection (ICSI) process, in whichthe sperm is inserted directly into the egg.Methods have been proposed to evaluate different characteristics of the oocytes to determine

their suitability for procreation. Thus, some researchers have tried to indicate the maturation andsuitability for fertilisation of oocytes, by considering their morphology and attributing scores to anumber of characteristics (Mahadevan and Fleetham, 1990; Serhal, Ranieri, Kins, Merchant,Davies and Kahdum, 1997; Veek, Wortham and Witmyer, 1983; Xia, 1997). Although the scoreattributed with the different methods is correlated with the experts evaluation of the quality andmaturity of the oocyte, nevertheless, the degree of association between the score and the outcome(birth or no birth) is low and not statistically significant.Fertilisation is achieved by inserting the male gamete (sperm) into the oocyte, so that the

gametes, the male and female reproductive cells, can fuse.Pronucleii are named for the cells in the stage immediately after fertilisation before the gametes

have fused, usually only visible for a short time an 18–22 hours after fertilisation. After fusion thepronuclei are no longer visible and shortly after that stage, cell division (cleavage) commences.Again, efforts have been made to use the morphological information of the pronucleii to evaluatethe quality of the pronucleii and their suitability for implantation with a view of giving rise toa birth (Scott and Smith, 1998; Scott et al., 2000; Tesarik and Greco, 1999). Although the relation-ships seem to be stronger than in the case of oocytes, the predictive precision is not, satisfactory.The product of fertilisation is called the embryo, when it is at an early stage of development,

usually in the womb, but also in vitro in the laboratory, after fertilisation and before the eighthweek. Technical terms are often given to differentiate the stages of development of the embryo,such as the blastocyst stage which consists of the stage of development that the embryo hasreached after 4–5 days, in which a fluid-filled cyst has formed and the cells have begun todifferentiate into the inner cell mass (which will form the fetus) and the trophectoderm (which willform the placenta and fetal membranes). The classification of embryos regarding their suitabilityof initiating and bringing about a successful pregnancy is one of the major classificatory problemsof human-assisted reproduction (Saith, Srivivasan, Mitchie and Sargent, 1998; Terriou et al.,2001).Here we shall only be concerned with describing a suitable algorithm and giving the results for

the recognition at each stage. An important aspect of using all three levels of information toperform the selection or the transfer policy will be discussed in another paper and here we shall beconcerned with the selection, as if the information was given independently, which clearly it is not.

The pattern recognition algorithm

Consider a training set of objects, which have been characterised by a set of common attributes onwhich moreover their classification is known. This is called a training set (Nieddu and Patrizi,2000).

G. Patrizi, C. Manna, C. Moscatelli and L. Nieddu / Intl. Trans. in Op. Res. 11 (2004) 365–379 367

An algorithm which will realise a precise classifier to determine the class to which other similarunclassified objects belong is Total Recognition by Adaptive Classification Experiments (TRACE).The data set which includes the training set and the classification set must be coherent, as

defined in Nieddu and Patrizi (2000). This means that the training set must have objects whoseclass membership is mutually exclusive, collectively exhaustive, and such that no two identicalobjects are assigned to different membership classes. There are various ways to check that this isso, both for the training set and for the classification set. Furthermore, the classification of theobjects in the training set may be assigned precisely or imprecisely.Objects classified precisely are those on which a set of class labels are defined such that they are

mutually exclusive and collectively exhaustive and thus form a coherent data set. This means thatthere exists a procedure which can be carried out eventually, which classifies the objects withcertainty in one of a number of classes. Coherency occurs when similar objects are classified in thesame class. This will depend on the structure attributed to the different classes (Nieddu andPatrizi, 2000).The classes assigned to a set of objects may have been determined with certainty, as when a

person has died and an autopsy has been performed, or may be known imprecisely, based onincomplete evidence, which may be forthcoming later or not at all, perhaps only through anexpensive analysis. In fact, a classifier is useful when it can be used to determine the classmembership of objects quickly and with very low costs.In pattern recognition the procedure by which objects to be used to determine a classifier is

termed ‘the teacher’, which may be either a ‘precise teacher’ if the labels are assigned to objectswith a procedure which results precise, or an ‘imprecise teacher’ if the labelling procedure assignsobjects to classes with error.The objective of a pattern recognition algorithm is to formulate a classifier which will have a

classification capability of a ‘precise teacher’, even in those cases where the objects considered inthe training set have been assigned labels by an ‘imprecise teacher’. This leads to a fascinatingmethodological problem (Nieddu and Patrizi, 2000) and it will be discussed further below.The pattern recognition algorithm works as follows (see Fig. 1). Given a training set, which is

coherent and for which the class of each pattern is known, mean vectors, called barycentres, canbe formed for each class. The Euclidean or generalised distance of each pattern from the meanvector of each class is determined. Out of all the patterns which are closer to a barycentre ofanother class, choose the one that is furthest from the barycentre of its own class and select this asa new barycentre. Thus for some classes, there are now two barycentres and all the patterns ofthat class can be reassigned to one or other barycentre, depending on the distance of each patternof that class from the barycentres of that class. The patterns in this class can then be repartitionedon the basis of their closeness to the two barycentres, so as to determine two new mean vectors forthe two resulting subclasses of that class.Each class will have a barycentre vector, except for one class which has two barycentres.

Calculate all distances anew and using the same criterion to determine a new pattern to be used toform a new barycentre, the procedure is repeated again and again, until all patterns result nearerto a barycentre of their own class than to one of another class.The algorithm is then said to have converged and if the training set is piecewise linearly

separable, (there are no two identical pattern vectors, which have been assigned to differentclasses), this will always be so, and the set of barycentre formed can be used in classification

G. Patrizi, C. Manna, C. Moscatelli and L. Nieddu / Intl. Trans. in Op. Res. 11 (2004) 365–379368

(Nieddu and Patrizi, 2000), while the training set will correctly indicate the class label of everyspecimen in the training set, if allowed to run until convergence.A spurious collection of entities, in which there is no similarity relations may occur and should

be recognised. With this algorithm, this occurrence is easily determined in training, as manybarycentres are formed. Such spuriousness may arise even in the presence of some meaningful

Mark all the elementswhich are closerto a barycentreof another class

Pick the vector which is farthest from a

barycentre of its classas the seed of a new

barycentre

NoHave thebarycentres changed?

STOP

START

Assign each patternvector to a

barycentre using aminimum distance

criterion

Compute all thebarycentres anew

No

Yes

Yes

Compute thebarycentre of

each class

Compute the distanceof each pattern vector

from all thebarycentres

Is everypattern vector

closer to a barycentreof its own class?

Fig. 1. Flow-chart of TRACE algorithm.

G. Patrizi, C. Manna, C. Moscatelli and L. Nieddu / Intl. Trans. in Op. Res. 11 (2004) 365–379 369

underlying relationships in the data, which are however swamped by noise and so data reductiontechniques may be useful (Nieddu and Patrizi, 2000; Watanabe, 1985).It can also be shown that for a large-enough training set the error rate made in the classification

of specimens of unknown class from the data set can be bounded by a constant, if it contains nospurious subsets (Nieddu and Patrizi, 2000).For verification a training set can be split by random sampling without replacement into two

subsets, one used in training and the other for verification. Often a split 90% in training and 10%in verification is used, but different values may be used depending on circumstances. Also toobtain the low value of the mean standard error of the estimate, the process is replicated 150times. This is the experimental set-up used in this paper. The number of trials and the relative sizeof the training set are conventional, but it is desired to choose sizes which allow a small standarderror without reducing the training set too much.When the classification of a training set is deemed to be imprecise, an iterative procedure may

be used to correct the attributions. To this end, consider a large training set, so that presumablythe probable classification error is small, as indicated above (Nieddu and Patrizi, 2000). If this isso, on replicating the classification procedure with different large subsets of the initial set, theindividual replicated sample classification error rates should form an unbiased estimate of theoverall classification error rate for the large set and thus be also small, given the initialassumption. This will be true, if the objects in the overall training set have been classified correctly.But what happens if these objects have been classified imprecisely?Suppose that the iterations are replicated 150 times, as described above. Objects will appear in

these verification sets about 15 times in every 150 replications. Thus if the object should result isbeing misclassified with respect to its actual classification, say two-thirds of the time, then theclassification that it received can be considered incorrect and its assigned class can be corrected tothe class it has most frequently been assigned in the verification procedures.The justification for this iterative procedure is the following. Under this experimental method, if

a misclassified object appears in the subset for the training of the classifier, it will form anabnormal barycentre, probably composed of this pattern alone and will not interfere significantlyin verification or classification. On the other hand, if an object has been misclassified by the‘imprecise teacher’ and appears in verification, the object will be misclassified a high proportion ofthe time, because it will be attached to its correct group, but its given class will be the incorrect oneassigned by mistake.Thus it can be shown that if objects have been classified imprecisely, i.e. with error, every time

this object will appear in verification, the classification algorithm will assign it (with a smallpresumed error) to its correct class. However when this is checked with the actual class label thatit has received, there will be a discrepancy because of the original imprecise classification. Thusthese objects, originally assigned to the wrong class label, will continue to appear in verification asobjects assigned incorrectly, in a statistical significant way. Thus by reclassifying these objects, tothe class most frequently assigned in verification, if their misclassification is much greater than therate implied by the classifier, the correct class is assigned to them (Bonifazi, Massacci, Nieddu andPatrizi, 1996). Thus with this correction procedure an ‘imprecise teacher’ may be rendered ‘precise’.When all have been corrected, the training and verification procedures are applied anew on

the corrected data set (see Bonifazi et al., 1996 for details). The whole process is known asclassification with an imprecise teacher.

G. Patrizi, C. Manna, C. Moscatelli and L. Nieddu / Intl. Trans. in Op. Res. 11 (2004) 365–379370

By using TRACE in this way, for sufficiently large samples, precise classification results can beobtained and the changes in the characteristics of the data set can be evaluated.

Methods and procedures

The research was conducted at a single fertility centre, during the period from January 1998 toDecember 2001, regarding 275 cycles of intra-cytoplasmic sperm injection (ICSI) for 195 patients.The rate of births per patient was 36.4%, while per cycle it was 25.8%.The ovarian stimulation consisted of administering recombinant FSH (Gonal F, Serono) at a

dosage of 150–400 IU according to individual response after the suppression with GnRHanalogues (Suprefact, Hoechst). Oocyte retrieval was carried out with echo-guided transvaginalfollicular aspiration. The laboratory procedures were performed by culturing gametes andembryos under oil in drops of culture medium (IVF, Scandinavia) with an atmosphere of CO2 5%in air. The ICSI process was performed according to current methodology.In the following subsections each experimental aspect is described, and the computational

techniques used are specified, while the results are given in the same order in the next section.

Determining the pattern vector and the class membership

The original procedure used photographs of 249 embryos at the four-cell stage taken 40–50 hoursafter fertilisation and before transfer, by placing each embryo under a microscope (InvertedMicroscope Olympus IX 70) with a camera (videocamera JVC TK-C401EG). Examples of theimages obtained are reproduced in Fig. 2, where four such embryos are shown (Grimaldi et al.,2002). All the embryos considered were subsequently transferred to uteri.Characteristics were defined from these images, by regarding each image as an array of

intensities of shades of gray, from which the frequency distribution of the grey tones for eachpicture, in the horizontal and vertical direction can be calculated. This pixel-intensity profileindicates the homogeneity of the image, or whether it has many dark and light spots and how theyare distributed. From these distributions a certain number of moments (polynomial functions ofcentral tendency and spread) can be calculated, to express the shape of the distribution in astandardised way. Thus a suitable number of these parameters 10, 20, or 30 were used to definethe pattern for each image. The results here are given only for the ten-element pattern vector (fivemoments in each coordinate direction), as this sample proved to be the most accurate (but seeGrimaldi et al., 2002, for other results).An image was assigned to class 2 if there was a birth from that embryo and to class 1 otherwise.

Classes could initially only be assigned with certainty in the case of single embryo transfers, orwhen all the transferred embryos produced births, or when no births occurred as a result of thattransfer. These are termed sure instances.Multiple transfers involving more births may be troublesome. Generally it is difficult to

determine which one of a set of embryos transferred is responsible for the birth. Thus if threeembryos are transferred and one baby is born, which one of the embryos that has given rise to thebirth must be determined.

G. Patrizi, C. Manna, C. Moscatelli and L. Nieddu / Intl. Trans. in Op. Res. 11 (2004) 365–379 371

Once a classifier has been obtained on the sure instances, all the rest of the embryos can beclassified on the basis of this classifier. This will assign a class label to every embryo in the givenset, but their classification will be imprecise because of errors that may have cropped up inclassification and because through this assignment the actual number of births may not match thepredicted number of births for each mother.If for any group of embryos transferred to the same uterus, the number of patterns classified as

belonging to class 2 is greater than the declared number of births, one or more patterns which arefurthest, from the closest barycentre of class 2, are arbitrarily assigned to the other class, and viceversa. Through the application of this algorithm every embryo was assigned to a specific class andthe number of births for each woman will match the actual total of babies born from her womb.This procedure was performed only once. (See Grimaldi et al., 2002 for details.)As some embryos may be incorrectly classified due to sampling variation, the iteration

procedure for an imprecise ‘teacher’ is then also applied, but with a restriction, as again the totalpredicted number of births must match the actual total for each women.Specifically, since the number of births is fixed for each patient, an additional condition must be

imposed. A corresponding embryo transferred to the same uterus from another class had to besimilarly misclassified. Thus if for a patient and a given cycle, it was found that in a highproportion of the times two embryos were classified incorrectly in verification in different classes,the assigned classes were exchanged, after having run 150 trials, so that on average, each embryohad appeared in verification about 15 times.

(a) (b)

(c) (d)

Fig. 2. Set of four embryos transferred to the same uterus.

G. Patrizi, C. Manna, C. Moscatelli and L. Nieddu / Intl. Trans. in Op. Res. 11 (2004) 365–379372

Under this algorithm there are nonlinear interactions between the barycentres formed intraining and incorrectly classified elements and the subsequent verification process. It isopportune, therefore, to iterate this correction procedure until there is no further gain in theaverage precision rate over 150 trials (Bonifazi et al., 1996).Thus, an accurate training set was formed for the initial sample, together with its barycentre

matrix, which could now be used to classify new specimens.Another 556 images were collected, 416 of which were sure instances and 140 images were

multiple transfers with mixed results. The former were added to the initial sample as sure instancesand once more training was performed, while the latter were considered a classification sample todetermine for each embryo its class membership.Once classified the data set was composed of 801 images with given classification, which was

still considered imprecise, because of sampling variability in the assignment, due possibly to theoperation of an imprecise ‘teacher’.Again the iteration procedure for an imprecise ‘teacher’ was repeated under exactly the same

conditions as before, only with a much larger training set. After a few iterations of the algorithm,the most accurate training set was obtained.

Classification and tests of hypotheses

With the full sample, the classification experiment was performed. Over 150 trials, 10% of thetraining set was selected by non-repetitive random sampling and set aside as a verification set,while the rests of the specimens were used to train the classifier at that iteration. Once more thiswas repeated 150 times. In fact, to be precise, these classification experiments were performed bothwith the original training set and with that obtained after the correction for an imprecise ‘teacher’.Once the classification results have been obtained and the correction procedure terminated, a

number of statistical tests must be performed to ensure that the differences in the recognition ofspecimens of one type and the other could not occur by chance.The method used considers two completely independent tests of hypotheses, the first through a

parametric test and the second through a non-parametric test.The null hypothesis formulated for both tests is that there is no significant difference between

the mean precision and the equiprobability rate, at a level of significance of 0.95. By meanprecision is meant the average precision over the given number of trials, while equiprobabilitymeans that if frequencies of prediction of each class based on the classifier are arranged in twocolumns and those that have been classed by the classifier are divided into two classes according totheir assigned classes, their frequencies per row are approximately the same. Thus the classifierdoes not separate the specimens by assigned class. If the tests show the difference between themean and equiprobability is significant, then the alternative hypothesis that there are systematicdifferences between the DET and the NODET specimens must be accepted.The first procedure consists in the standard test of difference between two means (Kendall,

Stuart and Ord, 1979). The standardised mean precision rate is compared to the equiprobabilityrate, as there are only two classes. The second procedure consists in forming two-way categoricaltables of the outcome of the trials for the classification of embryos, yielding a test of hypothesiswith an identical hypothesis structure (Agresti, 1990).

G. Patrizi, C. Manna, C. Moscatelli and L. Nieddu / Intl. Trans. in Op. Res. 11 (2004) 365–379 373

The eventual numerical disparity of the tests is due to the different underlying probabilitydistribution assumptions, but both tests should lead to the acceptance or rejection of the nullhypothesis at a very high level of significance.Four additional experiments were attempted, over and above the initial one indicated in

Grimaldi et al. (2002). During 2001, two-day embryos, three-day embryos, pronucleii, and oocyteswere photographed with the same equipment as described above and a label was assigned andcarefully noted. The oocytes before fertilisation were also photographed and a label was assignedand carefully noted, and finally, for the chosen pronucleii which were allowed to progress intoembryos for transplanting, again a label was assigned and carefully noted. Only the oocytes andthe pronucleii which were allowed to develop into embryos and transferred are considered in thisresearch, as otherwise their membership class would not be determinable.All three types of specimens can now be classified, and it is believed that an analysis of their

passage through each stage will be most beneficial for research, but this is not attempted in thispaper. Instead results are given regarding the prediction of births based on images of oocytes,pronucleii, and embryos considered independently.It is not the policy of that centre to dispose of embryos, so all pronucleii which were allowed to

develop, unless obviously unusable, were transferred.The complete experimentation consisted of:

� results on the initial classification and corrected sample for the imprecise ‘teacher’ of theoriginal sample of 245 embryos;

� results on the augmented sample of 801 embryos, both initially and after the imprecise ‘teacher’corrections;

� results on the initial and corrected sample of 133 three-day embryo images. This arose becausein the literature there is a certain emphasis that transferring three-day embryos would occasiona higher birth rate, so this was experimented from May to October 2001;

� results on the classification of 218 oocytes. each of which corresponds to a given embryo andthus can be classified in accordance with the latter;

� results on the classification of 165 pronucleii, each of which corresponds to a given embryo, andthus can be classified in accordance with the latter.

Experimental results

The individual specimens are easily recognised through a particularly precise pattern recognitionalgorithm, TRACE as belonging to one type or to the other, based on the earlier study of a preciseclassification of embryos (Grimaldi et al., 2002). Thus, identification of DET specimens will allowthe implantation of viable embryos and this should yield a much higher success rate in childbirthin human-assisted reproduction, and a marked reduction in the embryos transferred to a uterus toachieve childbirth.The original results are reported in the first two rows of Table 1, as indicated elsewhere

(Grimaldi et al., 2002). In the second column the mean precision rate is reported, while in the thirdcolumn the standard error of that estimate is indicated (standard error). Then the best and theworst rates of recognition are indicated for each run of 150 trials, and finally the number ofspecimens in the sample considered is indicated.

G. Patrizi, C. Manna, C. Moscatelli and L. Nieddu / Intl. Trans. in Op. Res. 11 (2004) 365–379374

With the uncorrected initial sample, the best average precision of 78.4% in verification wasobtained for the ten-element patterns. In eight trials of the 150 the correct classification wasgreater than 90% and in one trial it was completely correct (Grimaldi et al., 2002).The initial sample was corrected for an imprecise ‘teacher’ and it was found that seven patients

had embryos apparently misclassified, consisting of eight couples of embryos which had the requiredmisclassification characteristics, totalling 16 embryos, so the class membership of each embryo in thecouple was swapped. The modified data set was reclassified and the classification was performedanew for 150 times. The results are reported for completeness in the second row of Table 1.In Fig. 2, the four embryos shown composed one such set. Thus embryos (a) and (b) attributed

to class 2, by the procedure above, were predominantly recognised as class 1, while embryos (c)and (d) attributed to class 1 were classified predominantly assigned to class 2. Since these fourembryos had been transferred to the same uterus they should be considered classified impreciselyand their membership class changed.The new embryo images were added and classified on the basis of the larger training set

consisting of all the initial sample and the new sure instances, so that all received a classmembership label. Then the combined sample of 801 patterns was run 150 times and the resultsreported in the row third of the table.The whole sample was then once more subjected to a correction procedure for the ‘imprecise

teacher’ the required classifications altered and the results obtained after a subsequent run of 150trials is indicated in the fourth row for the combined corrected sample. Additional experimentswere conducted on three-day embryos, pronucleii, and oocytes.The images of the new embryos transferred on day 3 were now examined, by the same

methodology. Of these embryos, 103 were known certainly to be of class 1, since these had beentransferred to patients which had not given rise to births. Also, one patient had three embryosimplanted and gave place to three births, so 106 embryos were sure instances and the appropriateclass membership was assigned to the remaining 27 embryos through considering this set of three-day embryos as a classification set and the class membership of every specimen to be determined.It would not have been of course correct to use the barycentre set which had been formed in the

previous experiment since we wanted to establish whether these two data sets can actually becombined.Once the classification to the whole sample had been obtained, a run of 150 trials with a

random verification set, as described above was enacted and the results are recorded in the fifth

Table 1

Embryo Classification results for TRACE: precision in the verification samples for various sampling procedures, two

classes, five moments per instance.

Mean s.e.e. Best Worst Number

Original sample uncorrected 0.7844 0.006531 1.0000 0.5200 245

Original sample corrected 0.8179 0.006129 1.0000 0.6000 245

Combined sample uncorrected 0.7980 0.004103 0.8889 0.7046 801

Combined sample corrected 0.8406 0.003645 0.9383 0.7531 801

Day 3 sample uncorrected 0.7589 0.010993 1.0000 0.4286 133

Day 3 sample corrected 0.8126 0.009692 1.0000 0.5000 133

G. Patrizi, C. Manna, C. Moscatelli and L. Nieddu / Intl. Trans. in Op. Res. 11 (2004) 365–379 375

row for ‘Day 3 sample uncorrected’. This data set was subjected to the usual correction procedurefor an imprecise ‘teacher’ and the results are given in the next row.For the classification of the oocytes and the pronucleii, each specimen was assigned to the final

class of the embryo that had developed from it and so for this experimentation the classificationwere considered definite. Thus two runs of 150 trials of TRACE were carried out and the resultsare recorded in Table 2.Each estimate was subjected to two different versions of tests of hypothesis; parametric test and

a non parametric test. We next provide details for the most important tests indicating that theseresults could not have arisen by chance.The first procedure consists in the standard test of difference between two means. The mean

precision rate obtained for the first experiment (0.7844) is compared to the equiprobability rate(0.5) over the standard error of estimates, which is (0.0065) (see Kendall, Stuart and Ord, 1979).The null hypothesis must be rejected at the given level of significance, (in fact the resultingobserved probability value determined (p-value) was much lower), so the alternative hypothesismust be accepted.The second procedure consists in forming two-way categorical tables of the outcome of the

trials for the classification of embryos before subsequent elaborations (Agresti, 1990). Thecumulative two-way categorical table over the 150 trials was determined and the log cross productratio test (or odds ratio test, as it is often called) was applied, yielding a test of hypothesis with anidentical hypotheses structure (Agresti, 1990).The null hypothesis of independence in the columns of the table must be rejected at the given

level of significance (in fact the resulting observed probability value (p-value) is much lower) andtherefore the alternative hypothesis is accepted.The probability that such an experiment came about purely by chance, implying that there are

no systematic differences in the pattern vectors due to class membership is less than one chance in1,000,000, so the conclusion is that there are systematic differences in the pattern vectorsoriginating from DET and NODET embryos. As the patterns vectors are derived from the pixelmaps of the images of the embryos, these differences must occur at the level of the images andtherefore regarding the morphology of the embryos.Similar results and conclusions follow for the remaining rows in Table 1 and for both rows in

Table 2. Thus with at very high level of statistical significance, the null hypothesis is rejected in allcases.For the three-day embryos, as compared to the combined sample results (the results of row 3 to

those of row 5 and those of row 4 to those of row 6, all for Table 1), the tests of hypothesisindicate that the statistic came from two different populations. In fact the difference between thetwo uncorrected sample proportions (0.7980 and 0.7589) or the two corrected sample proportions

Table 2

Oocyte and pronucleii classification results for TRACE: precision in the verification samples, two classes, five moments

per instance.

Mean s.e.e. Best Worst Number

Oocytes sample 0.6957 0.009603 0.8696 0.4444 218

Pronucleii sample 0.8509 0.007882 1.0000 0.5882 165

G. Patrizi, C. Manna, C. Moscatelli and L. Nieddu / Intl. Trans. in Op. Res. 11 (2004) 365–379376

(0.8406 and 0.8126) cannot be ascribed to sampling variability, so the null hypothesis that thereare no significant differences between two-day and three-day embryos must be rejected at thegiven confidence level. Thus it is wrong to recognise the latter by using a training set of the former,and vice versa. Significant morphological changes occur in the images of these three-day embryoscompared to those of two-day embryos, which therefore lead to some difficulties in thecomparison of their respective images.We can test the difference in the mean precision rate for the pronucleii sample as against the two

experimental samples, i.e. the original corrected sample composed of 245 patterns and thecombined sample corrected of 801 patterns (see Table 1, row 2 and row 4). On applying aweighted standard error of the estimates, as the sample frequencies are different, in the standardstatistical procedure for testing difference between two means, or we perform the non-parametrictest technique, we obtain the result that there is no significant difference at a confidence level of0.95 between the recognition rates of pronucleii and either of these experimental samples. Instead,on repeating the same statistical procedures with the oocyte sample, significant differences existbetween the recognition rate of oocytes and these other experimental samples.

5. Discussion

More accurate proportions can be obtained by extending the sample of specimen to be used in theexperiment (see Nieddu and Patrizi, 2000). This can be seen by comparing row 1 with row 3 androws 2 and 4 in Table 1. Thus, the prediction error will decrease on enlarging suitably the trainingsample. This will render false negative results, (embryos considered NODET when they are DET)increasingly unlikely and in fact, this aspect is clearly seen in the results reported, although theincrease in accuracy is less than in proportion to the increase in the sample size.The proportion of false negatives is not at issue here. The point is that embryos can be easily

recognised as DET and NODET before being transferred and this can be done with whateveraccuracy is desired (Grimaldi et al., 2002).The characteristics of the potential mother do not affect the recognition, since this is done

before transfer. There may be a sampling effect, in the experiments conducted, due to theparticular sample of mothers and fathers used, but this effect must be small since verification wasconducted always on different sets of mothers, through the 150 trials carried out at each stage.Sampling effects, in small samples, may affect slightly the precision, but not the relationship or theexistence of these two types of embryos. This is in accordance with other recent results (Scottet al., 2000; Tesarik and Greco, 1999).The residual imprecision may also be due to social and biological aspects of the patients. This

can be seen if tests of significance are run between the actual precision and extremely high level ofprecision, which would result if recognition was almost perfect, (say 0.97 or higher). The nullhypothesis will in all cases be rejected, leading to the conclusion that there are systematic residualeffects, only in part explainable by the size of the sample.This may be investigated further by considering the recognition of the suitability of future

embryos from the oocytes. As it is well-known, the embryo is generated by the action of the malesperm, which is totally missing from this classification. Thus, the lower precision rate may be due,over and above the other residual effects, to the contribution of the male.

G. Patrizi, C. Manna, C. Moscatelli and L. Nieddu / Intl. Trans. in Op. Res. 11 (2004) 365–379 377

The transfer of only the resulting DET embryos would increase greatly the efficacy of human-assisted reproduction, even though the actual success for each woman will depend on her effectivecharacteristics. Such a policy would reduce greatly the number of transfers per woman.There are various schema to classify potentially good embryos, but these systems do not seem

to be very effective. For instance, the embryo shown in Fig. 3 would receive a score of 5(1,1) underthe Mills schema (Mills, 1992), and therefore it constitutes a very good specimen for child-bearingin that classification scheme. It turns out that it did not give rise to offspring, and this wascorrectly recognised by the algorithm, as being a NODET embryo (Grimaldi et al., 2002). Byanalysing the images many other such examples can be given, even with other scoring methods, interms of both false positives and false negatives.The images of the embryos in Fig. 2 are particularly provoking. Visual inspection by an expert

does not seem to suggest any possible criteria for the recognition of the valid embryos, yet thereare structural differences in embryo (c) and (d) which allow the algorithm to differentiate themcorrectly from the images (a) and (b) of the embryos. Being certain that such differences exist andcan be perceived, albeit not by an unaided human expert, is an important result, full of potentialclinical applications in this and in many other fields.Once the existence of different types of embryos have been ascertained, similar techniques to

these can be used to study the structural differences between them, and so ensure clinical progress,and thus advance clinical results in human reproduction.This algorithm and the results obtained not only contribute to the solution of clinical problems

in reproduction, but it can be used to solve a number of important taxonomic and recognitionproblems in medicine, both with regard to function and structure, since it allows us to determinethe existence of differences not discernible by human senses.

Acknowledgements

The authors would like to thank Professor Giuseppe Grimaldi in bringing together embryologists,human reproduction specialists and operations researchers and thus permit a synergy to developand Dr Paola Simonazzi for having broached this object of research in her Laurea thesis. They arealso extremely grateful to Dr Ashraf Rahman and the medical staff of the Genesis Centre for their

Fig. 3. Image of an embryo with a high quality score by the Mills method.

G. Patrizi, C. Manna, C. Moscatelli and L. Nieddu / Intl. Trans. in Op. Res. 11 (2004) 365–379378

guidance and assistance. They gratefully acknowledge the invaluable help of the anonymousreferees, while retaining all responsibility for eventual persistance of errors.

References

Agresti, A., 1990. Categorical Data Analysis. Wiley, New York.

Bonifazi, G., Massacci, P., Nieddu, L., Patrizi, G., 1996. The classification of industrial sand-ores by image recognition

methods. In Proceedings of 13thInternational Conference on Pattern Recognition Systems, Vol. 4: Parallel and

Connectionist Systems, pages 174–179, Los Alamitos, California, I.E.E.E. Computer Society Press.

Grimaldi, G., Manna, C., Nieddu, L., Patrizi, G., Simonazzi, P., 2000. A diagnostic decision support system and its

application to the choice of suitable embryos in human assisted reproduction. Technical report, Dipartimento di

Statistica, Probabilita e Statistiche Applicate, Universita degli Studi ‘La Sapienza’, Rome, Italy no. 18.

Grimaldi, G., Manna, C., Nieddu, L., Patrizi, G., Simonazzi, P., 2002. A diagnostic decision support system and its

application to the choice of suitable embryos in human assisted reproduction. Central European Journal of

Operational Research, 10 (1), 29–44.

Kendall, M., Stuart, A., Ord, J.K., 1979. The Advanced Theory of Statistics. Griffin, London.

Mahadevan, M.M., Fleetham, J., 1990. Relationship of human oocyte scoring system to oocyte maturity and

fertilization capacity. International Journal of Fertility, 240–244.

Mills, C.L., 1992. Factors affecting embryological parameters and embryological selection for ivf-et. In: Group, T.P.P.

(Ed.), In Vitro Fertilization and Assisted Reproduction. The Partenon Publishing Group, London, pp. 187–204.

Moscatelli, C., 2002. Classificazione di embrioni ed ovociti nella fecondazione assistita: Approccio medico e statistico a

confronto. Technical report, Tesi di Laurea, Facolta di Scienze Statistiche, Universita La Sapienza, Rome, Italy.

Nieddu, L., Patrizi, G., 2000. Formal properties of pattern recognition algorithms: A review. European Journal of

Operational Research, 120 (3), 459–495.

Roux, C., Joanne, C., Agnani, G., Fromm, M., Clavequin, M.C., Bresson, J.L., 1995. Morphometric parameters of

living human in vitro fertilization embryos. importance of the asynchronous division process. Human

Reproduction, 10 (5), 1201–1207.

Saith, R.R., Srivivasan, A., Mitchie, D., Sargent, I.L., 1998. Relationships between the developmental potential of

human ivf embryos and features describing the embryo, oocyte and follicle. Human Reproduction Update, 4 (2),

1121–134.

Scott, L.A., Smith, S., 1998. The successful use of pronuclear embryo transfer the day following oocyte retrevial.

Human Reproduction, 13 (4), 1003–1013.

Scott, L., Alvero, R., Leondires, M., Miller, B., 2000. The morphology of human pronuclear embryos is positively

related to blastocyst development and implantation. Human Reproduction, 15 (11), 2394–2403.

Serhal, P.F., Ranieri, D.M., Kins, A., Merchant, S., Davies, M., Kahdum, I.M., 1997. Oocyte morphology predicts

outcome of intra cytoplasmatic sperm injection. Human Reproduction, 12 (6), 1267–1270.

Simonazzi, P., 2000. Algoritmi di riconoscimento di immagini di blastomeri nella riproduzione assistita. Technical

report, Tesi di Laurea, Facolta di Scienze Statistiche, Universita La Sapienza, Rome, Italy.

Templeton, A., Morris, J.K., Parslow, W., 1996. Factors that affect outcome of in-vitro fertilisation treatment. The

Lancet, 348 (23), 1402–1406.

Terriou, P., Sapin, C., Giorgetti, C., Hans, E., Spach, J.-L., Roulier, R., 2001. Embryo score is a better predictor of

pregnancy than the number of transferred embryos or female age. Fertility and Sterility, 75 (3), 525–531.

Tesarik, J., Greco, E., 1999. The probability of abnormal preimplantation development can be predicted by a single

statiscal observation on pronuclear stage morphology. Human reproduction, 44 (5), 1318–1323.

Veek, L.L., Wortham Jr., J.W.E., Witmyer, J., 1983. Maturation and ferilization of morphologically immature human

oocytes in a program of in vitro fertilization. Fertility and Sterility, 57 (4), 594–602.

Watanabe, S., 1985. Pattern Recognition: Human and Mechanical. Wiley, New York.

Xia, P., 1997. Intra cytoplasmatic sperm injection correlation of oocyte grade based on polar body, perivitelline space

and cytoplasmatic inclusions with fertilization rate and embryo quality. Human Reproduction, 12 (8), 1750–1755.

G. Patrizi, C. Manna, C. Moscatelli and L. Nieddu / Intl. Trans. in Op. Res. 11 (2004) 365–379 379