Statistical Methods for Machine Translation

17
Statistical Methods for Machine Translation Stephan Vogel, Franz Josef Och, S. Nießen, H. Sawaf, C. Tillmann, Hermann Ney Lehrstuhl f¨ ur Informatik VI, RWTH Aachen, Germany May 30, 2000 Abstract. In this article we describe the statistical approach to machine translation as implemented in the it stattrans-module of the VERBMOBIL system. 1 Introduction In this paper, we describe the present status of the machine translation approach de- veloped at RWTH Aachen and report experimental results obtained for the Verbmo- bil task. The ultimate goal of this task is spontaneous speech translation as opposed to text translation. The experimental tests are reported for both text and speech input. There are a couple of characteristic features addressed in our system and in this paper: In the Verbmobil task, the translation direction from German to English poses special problems due to the big difference in the word order of the German and English verb groups. In addition, there are the word compounds in the German language like Gesch¨ aftsreise for business trip that require refined alignment models. In addition, the bilingual corpus is a transcription of spontaneously spoken sen- tences. Thus, it exhibits the typical phenomena of spontaneous speech, such as high variability of the syntactic structures and hesitations. We use a comparatively small amount of bilingual training data, namely about 500000 running words for a vocabulary of 10000 words (in the source lan- guage). 2 The Statistical Approach to Translation 2.1 Principle The goal is the translation of a text given in some source language into a target language. We are given a source string , which is to be translated into a target string In this paper, the term word always refers to a

Transcript of Statistical Methods for Machine Translation

Statistical Methods for Machine Translation

Stephan Vogel, Franz Josef Och, S. Nießen, H. Sawaf, C. Tillmann, Hermann Ney

Lehrstuhl fur Informatik VI, RWTH Aachen, GermanyMay 30, 2000

Abstract. In this article we describe the statistical approach to machine translationas implemented in the it stattrans-module of the VERBMOBIL system.

1 Introduction

In this paper, we describe the present status of the machine translation approach de-veloped at RWTH Aachen and report experimental results obtained for the Verbmo-bil task. The ultimate goal of this task is spontaneous speech translation as opposedto text translation. The experimental tests are reported for both text and speech input.

There are a couple of characteristic features addressed in our system and in thispaper:

– In the Verbmobil task, the translation direction from German to English posesspecial problems due to the big difference in the word order of the German andEnglish verb groups. In addition, there are the word compounds in the Germanlanguage like Geschaftsreise for business trip that require refined alignmentmodels.

– In addition, the bilingual corpus is a transcription of spontaneously spoken sen-tences. Thus, it exhibits the typical phenomena of spontaneous speech, such ashigh variability of the syntactic structures and hesitations.

– We use a comparatively small amount of bilingual training data, namely about500 000 running words for a vocabulary of 10000 words (in the source lan-guage).

2 The Statistical Approach to Translation

2.1 Principle

The goal is the translation of a text given in some source language into a targetlanguage. We are given a source string , which is to be translatedinto a target string In this paper, the term word always refers to a

Source Language Text

Transformation

Lexicon Model

Language Model

Global Search:

Target Language Text

over

Pr(f 1 J | e1I )

Pr( e1I )

Pr(f 1 J | e1I ) Pr( e1

I )

e1I

f 1 J

maximize Alignment Model

Transformation

Figure 1. Architecture of the translation approach based on Bayes decision rule.

full-form word. Among all possible target strings, we will choose the string with thehighest probability which is given by Bayes’ decision rule Brown et al., 1993:

(1)

(2)

is the language model (LM) of the target language, whereas isthe string translation model. The argmax operation denotes the search problem, i.e.the generation of the output sentence in the target language.

The overall architecture of the statistical translation approach is summarized inFigure 1. In general, as shown in this figure, there may be additional transformationsto make the translation task simpler for the algorithm. The transformations mayrange from the categorization of single words and word groups to more complexpreprocessing steps that require some parsing of the source string. We have to keepin mind that in the search procedure both the language and the translation model areapplied after the text transformation steps. However, to keep the notation simple,we will not make this explicit distinction in the subsequent exposition.

2.2 Basic Alignment Models

A key issue in modeling the string translation probability is the questionof how we define the correspondence between the words of the target sentence and

2

wellI

thinkifwecanmakeitat

eighton

bothdays

ja

ich

denke

wenn

wir

das

hinkriegen

an

beiden

Tagen

acht

Uhr

Figure 2.Manual alignment.

the words of the source sentence. In typical cases, we can assume a sort of pairwisedependence by considering all word pairs for a given sentence pair .Here, we will further constrain this model by assigning each source word to exactlyone target word. Later, this requirement will be relaxed. Models describing thesetypes of dependencies are referred to as alignment models (Brown et al., 1993, Da-gan et al., 1993, Kay and Roscheisen, 1993, Vogel et al., 1996).

When aligning the words in parallel texts (for Indo-European language pairs likeSpanish-English, French-English, Italian-German,...), we typically observe a stronglocalization effect. Figure 2 illustrates this effect for the language pair German-to-English. In many cases, although not always, there is an even stronger restriction:over large portions of the source string, the alignment is monotone.

To arrive at a quantitative specification, we first define the

alignment mapping:

which assigns a word in position to a word in position . The conceptof these alignments is similar to the alignments introduced by Brown et al. (1993).By looking at such alignments, it is evident that the mathematical model should tryto capture the strong dependence of on the preceding alignment. Therefore, forour ultimate model, the probability of alignment for position should have adependence on the previous alignment position :

3

We can rewrite the probability by introducing the ‘hidden’ alignmentsfor each sentence pair :

where we have included a sentence length probability . In the last two equa-tions, the dependence has been confined to a first-order dependence.

Putting everything together, we have the following ingredients:

– the sentence length probability: , which is included here for complete-ness, but can be omitted without loss of performance;

– the lexicon probability: ;– the alignment probability: , which here has been chosen as first-order model.

Rather than a first-order dependence,we can also use a zero-ordermodel ,where there is only a dependence on the absolute position index of the sourcestring. For this zero-order model, it can be shown (Brown et al., 1993) that we havethe following identity:

The sum in the last equation can be interpreted as a mixture-type distribution withmixture weights and with component distributions , that modelthe pairwise dependencies between and . Except for the missing “empty word”,this model is identical to so-called Model IBM-2 (Brown et al., 1993).

Assuming a uniform alignment probability

4

we arrive at the so-called IBM-1 model (Brown et al., 1993). The attractive propertyof the IBM-1 model is that, for maximum likelihood training (Brown et al., 1993),there is only one optimum and therefore the EM algorithm (Baum, 1972) alwaysfinds the global optimum.

3 Alignment Template Approach

A general deficiency of the baseline alignment models is that they are only ableto model correspondences between single words. Therefore, we will now considerwhole phrases rather than single words as the basis for the alignment models. Inother words, a whole group of adjacent words in the source sentence may be alignedwith a whole group of adjacent words in the target language. As a result the contextof words has a greater influence and the changes in word order from source to targetlanguage can be learned explicitly.

3.1 The word level alignment: alignment templates

The key element of the extended translation model are the alignment templates.An alignment template is a triple which describes the alignmentbetween a source class sequence and a target class sequence . The use of classesinstead of words themselves has the advantage of a better generalization. If thereexist classes in source and target language which contain all towns it is possible thatan alignment template learned using a special town can be generalized to all towns.

The classes used in and are automatically trained bilingual classes usingthe method described in (Och, 1999) and constitute a partition of the vocabularyof the vocabulary of source and target language. The class functions and mapwords to their classes.

The alignment is represented as a matrix with binary values. A matrix elementwith value 1 means that the words at the corresponding positions are aligned and thevalue 0 means that the words are not aligned. If a source word is not aligned to atarget word then it is aligned to the empty word which shall be at the imaginaryposition .

An alignment template is applicable to a sequence of sourcewords if the alignment template classes and the classes of the source words areequal: . The application of the alignment template constrains the targetwords to correspond to the target class sequence: .

The application of an alignment template does not determine the target words,but only constrains them. For the selection of words from classes we use a statis-tical model for based on the lexicon probabilities of a statistical lexicon

. We assume a mixture alignment between the source and target languagewords constrained by the alignment matrix :

(3)

5

(4)

(5)

3.2 The phrase level alignment

In order to describe the phrase level alignments in a formal way, we first decomposeboth the source sentence and the target sentence into a sequence of phrases( ):

In order to simplify the notation and the presentation, we ignore the fact that therecan be a large number of possible segmentations and assume that there is only onesegmentation. In the previous section, we have described the alignment within thephrases. For the alignment between the source phrases and the target phrases, we obtain the following equation:

For the phrase level alignment we use a first-order alignment modelwhich is in addition constrained to be a permutation of the phrases.

For the translation of one phrase, we introduce the alignment template as anunknown variable:

(6)

The probability to apply an alignment template gets estimated by relativefrequencies (see next section). The probability is decomposed by Equation(3).

3.3 Training

The training of the alignment templates requires the following steps: First, we traintwo word-based alignment models for the two translation directions and

6

by applying the EM-algorithm. In this step, any of the basic alignment modelscan be used. For each translation direction we calculate the Viterbi-alignment ofthe translation models determined in the previous step. Thus we get two alignmentvectors and for each sentence pair.

We increase the quality of the alignments by combining the two alignment vec-tors into one alignment matrix using the following method.

and denote the set of links in the two Viterbi-alignments. In a first step the intersection is determined. The elementswithin are justified by both Viterbi-alignments and are therefore very reliable. Wenow extend the alignment iteratively by adding links occurring only inor in if they have a neighboring link already in or if neither the word nor theword are aligned in . The alignment has the neighboring links ,

, , and .This enhanced alignment is now used to obtain the parameters required for the

alignment template approach. The bilingual word lexicon is estimated by therelative frequencies of the alignment determined in the previous step:

(7)

Here is the frequency that the word is aligned to and is the fre-quency of in the training corpus.

We determine correlated bilingual word classes for source and target languageby using the method described in Och (1999). The basic idea of this method is toapply a maximum-likelihoodapproach to the joint probability of the parallel trainingcorpus. The resulting optimization criterion for the bilingual word classes is similarto the one used in monolingual maximum-likelihood word clustering.

Finally, we have to collect the alignment templates. To do so, we count allphrase-pairs of the training corpus which are consistent with the enhanced align-ment matrix. A phrase-pair is consistent with the alignment if the words within thesource phrase are only aligned to words within the target phrase. Thus we obtaina count of how often an alignment template occurred in the aligned trainingcorpus. The probability of using an alignment template needed by Equation (6) isestimated by relative frequency:

(8)

Figure 3 shows some of the extracted alignment templates. The extraction algo-rithm does not perform a selection of good or bad alignment templates - it simply ex-tracts all possible alignment templates. Actually, only the maximal alignment tem-plates are shown. Other, smaller templates extracted from that alignment include:‘wie sieht es’ - ‘how about’, ‘am neunzehnten’ – ‘the nineteenth’, and ‘nachmit-tags’ – ‘in the afternoon’.

7

okay,

howaboutthe

nineteenthat

maybe,

twoo’clock

inthe

afternoon?

okay ,

wie

sieht

esam

neunzehnten

aus ,

vielleicht

um

zwei

Uhr

nachmittags ?

Figure 3. Example of a word alignment and some learned alignment templates.

3.4 Search

For decoding we use the following search criterion:

(9)

This decision rule is an approximation to Equation (1) which would use thetranslation probability . Using the simplification it is easy to integrate trans-lation and languagemodel in the search process as both models predict target words.As experiments have shown this simplification does not affect the quality of trans-lation results.

To allow the influence of long contexts we use a class-based five-gram languagemodel with backing-off.

In Figure 4 the decisions taken during the search process are shown. First, thesource sentence words are mapped into their word classes. Those alignment tem-plates matching part of the word class sequence are selected. Reordings of the align-ment templates are possible to allow for global word reordering. The alignment tem-plates generate a sequence of target word classes. In the final step the actual wordsequence is generated. During this step the target language model and the lexiconprobabilities are taken into account to score the translation hypothesis. In Figure 4$ denotes the sentence start/end marker.

In search we produce partial hypotheses, each of which contains the followinginformation:

8

e1 e2 e3 e4 e5 e6

f 1 f 2 f 3 f 4 f 5 f 6 f 7

f 1 f 2 f 3 f 4

z1 z2 z3 z4

e1 e2 e3 e4

$ $

Figure 4. Decisions during search process.

1. the last target word produced,2. the state of the language model (the classes of the last four target words),3. a bit-vector representing the already covered positions of the source sentence,4. a reference to the alignment template instantiation which produced the last tar-get word,

5. the position of the last target word in the alignment template instantiation,6. the accumulated costs (the negative logarithm of the probabilities) of all previ-ous decisions,

7. a reference to the previous partial hypothesis.

A partial hypothesis is extended by appending one target word. The set of all partialhypotheses can be structured as a graph with a source node representing the sentencestart, leaf nodes representing full translations and intermediate nodes representingpartial hypotheses. We recombine partial hypotheses which cannot be distinguishedby neither language model nor translation model. When the elements 1 - 5 of twopartial hypotheses do not allow to distinguish between two hypotheses it is possibleto drop the hypothesis with higher costs for the subsequent search process.

We also use beam-search in order to handle the huge search space. We comparein beam-search hypotheses which cover different parts of the input sentence. Thismakes the comparison of the costs somewhat problematic. Therefore we integratean (optimistic) estimation of the remaining costs to arrive at a full translation. Thiscan be done efficiently by determining in advance for each word in the source lan-guage sentence a lower bound for the costs of the translation of this word. Togetherwith the bit-vector stored in a partial hypothesis it is possible to achieve an efficientestimation of the remaining costs.

9

4 System Integration

The statistical approach to machine translation is embodied in the stattrans-modulewhich is integrated into the Verbmobil system. The implementation allows for trans-lating from German to English and from English to German. In normal processingmode the stattrans-module gets in input from the repair-module. At that time theword lattices and best hypotheses from the speech recognition systems have beenprosodicaly annotated. Translation is performed on the best hypothesis of the rec-ognizer.

The prosodic boundaries and mode information are utilized by stattrans using avery simple heuristics. If there is a major phrase boundary a full stop or questionmark is inserted into the word sequence, depending on the sentence mode as in-dicated by the prosody-module. Additional commas are inserted for other types ofsegment boundaries. As the prosody-module gives probabilites for segment bound-aries thresholds are used to decide if the sentence marks are to be inserted. Thesethresholds were selected to give on average good segmentation of the input. Thesegment boundaries restrict possible word reordering between source and target lan-guage. This not only improves translation quality but also restricts the search space,thereby speeding up translation.

The output of the stattrans-module is the translation as plain string together witha confidence measure required by the selection-module. The score for the transla-tion which results from the cumulation of the log-probabilities from the differentknowledges source combined in the search process is normalized to the sentencelength an mapped into the interval .

The stattrans-module uses approx. 200MB of memorymainly to store the align-ment templates, the lexicons and the language models for the two translation direc-tions. Translation speed is very high and typically only a few tens of a second evenfor longer turns. In the overall Verbmobil system the processing time used by thestattrans-module is about .

5 Experimental Results

5.1 The Task and the Corpus

The statistical translation approachwas tested on the Verbmobil Corpus. The translit-erations of the recorded dialogs have been translated by Verbmobil partners (Hildesheimfor Phase I and Tubingen for Phase II). As different translators where involved thereis great variability in the translations.

The turns are sometimes rather long and may consist of several sentences. Toprepare the training corpus, these turns were split into shorter segments using sen-tence marks as potential split points. As the sentence marks do not always coincide,a dynamic programming approachwas used to find the optimal segmentation points.source segments can be aligned to target segments. This alignment is scored

using a word-based alignment model. That segmentation of a sentence pair is used

10

which gives the best overall score. Additional restrictions are applied to avoid seg-ment pairs with very low score. The translation and language models where thentrained on the segmented corpus.

An official vocabulary has been agree upon for the speech recognizers. However,not all of these vocabulary items are covered by the training corpus. Therefore, anadditional lexicon was constructed semi-automatically. Online lexiconswere used toextract translations for words missing in the training corpus. Some additions had tobe made manually. The resulting lexicon contained not only word - word items butalso multi-word translations, especially for the large number of German composita.

In Table 5.1 the characteristics of the training and test sets are summarized.

Table 1. Training and Test Corpus

German EnglishTrain Sentences 58 332

Words 519 523 549 921Vocabulary 7 940 4 673

Lexikon Items 12 779Words 15 101 18 213ex. Vokabular 11 501 6 867

Test Sentences 147Words 1 968 2 173Trigr. PP (40.3) 28.8

5.2 Alignment Quality

We measure the quality of the above mentioned alignment models with respect toalignment quality and translation quality.

To get a reference alignment we manually aligned about 1.4 percent of our train-ing corpus. It is well-known that manually performing a word-alignment is a com-plicated and ambiguous task (Melamed, 1998). Therefore, we allowed the humanswhich performed the alignment to specify two different kinds of alignments: an S(sure) alignment which is used for alignments which are unambiguously and a P(possible) alignment which is used for alignments which might or might not exist.The P relation is used especially to align words within idiomatic expressions, freetranslations, and missing function words. Figure 5 shows an example of a manuallyaligned sentence with S and P relations.

The quality of an alignment is then measured using the followingerror rate:

11

yes,

thenI

wouldsay,

letus

leaveitat

that.

ja ,

dann

w"urde

ich

sagen ,

verbleiben

wir

so .

Figure 5. Example of a manual alignment with sure (filled dots) and possible connections.

The reference alignment does not prefer any translation direction (it is symmet-ric) and contains many-to-one and one-to-many relationships. Therefore the Viterbialignments of the baseline alignment model will not have zero errors.

The following table shows the alignment quality of different alignment modelson the Verbmobil task:

Alignment Errors [%]Dictionary no yesEmpty Word no yes yesModel 1 17.8 16.9 16.0Model 2(diag) 12.7 11.7 10.6HMM 11.7 9.9 9.2Model 4 9.2 7.9 6.6

We conclude that more refined alignment models are crucial for good alignmentquality. Especially the use of a first-order alignment model, modeling an emptyword and fertilities are important. The improvement by using a dictionary is smallcompared to the effect of proper statistical modelling.

12

5.3 Translation ResultsPerformanceMeasures for TranslationQuality. Wemeasure the translation qual-ity using two different criteria:– Word Error Rate (WER): The edit distance (number of insertions, dele-tions and substitutions) between the produced translation and one predefinedreference translation is calculated. The edit distance has the great advantageto be automatically computable, and as a consequence, the results are inexpen-sive to get and reproducible, because the underlying data and the algorithm arealways the same.The great disadvantage of the WER is the fact that it depends fundamentally onthe choice of the sample translation and that it does not take into account howserious different errors are for the meaning of the translation.

– Subjective Sentence Error Rate (SSER): The translations classified into a smallnumber of quality classes, ranging from “perfect” to “absolutely wrong”. Incomparison to theWER, this criterion is more liable and conveysmore informa-tion, but to measure the SSER is expensive, as it is not computed automaticallybut is the result of labourous evaluation by human experts. The SSER is usede.g. in Nießen et al. (1998).To support the assignment of the subjective error scores and to guarantee high

consistency an evaluation tool has been developed which displays already evaluatedtranslations along with the new translation and also allows for an extrapolation ofthe SSER by finding nearest matches to former evaluations stored in a database(Nießen et al., 2000, Vogel et al., 2000).

Effect of Preprocessing. There are a number of problems for the statistical ap-proach to translation which can mainly attributed to the sparse data problem. Manyword and many syntactical constructions are seen only once in the training data. Forthese it is often not possible to train reliable alignments. However, in some cases theproblems can be lessened be appropriate preprocessing steps. Most important forthe Verbmobil task is the handling of numbers in time expressions like ‘halb zehn’to be translated as ‘half past nine’. Therefore, simple substitution rules are used tonormalize such expressions in the training corpus. The same substitutions are ap-plied online for the test corpus. The effect of this preprocessing is given in Table5.3. Whereas the WER gives only a small improvement the translation quality asmeasured by the subjective sentence error rate shows a clear improvement.

Table 2. Effect of Preprocessing on Translation Quality.

WER[%] SSER[%]no preprocessing 50.6 22.9preprocessing 48.6 16.8

13

Effect of Alignment-Models. It has already been shown that stronger alignmentmodels result in improved alignment quality. How this effects the translation qualitycan be seen in Table 5.3. The translations were produced for text input.

Table 3. Effect of Alignment Model on Translation Quality.

Model WER[%] SSER[%]IBM-1 49.8 22.2HMM 47.4 19.3inv. HMM 48.6 16.8

The improvement is due to better lexicons and better alignment templates ex-tracted from the resulting alignments. The search process and also the preprocessingwas the same for all three runs.

5.4 Translation Expamples

Disambiguation. For the statistical approach to translation no explicit informa-tion about different meanings of words are stored. Rather, this has to be extractedfrom the corpus and stored in the alignment templates, the lexicon and the languagemodel in an implicit way. The following examples show that in many cases the con-text stored in this way allows for correct disambiguation. The first two groups ofsentences contain the verbs ‘gehen’ and ‘annehmen’ which have different transla-tions. Some of these examples are rather collocational. Correct translation is onlypossible by taking the whole phrase into account. The last two sentences show thedisambiguation of prepositions with the example of temporal and locational ‘vor’.

Table 4. Disambiguation Examples

Input Translationgehen Wir gehen ins Theater . we will go to the theater .

Mir geht es gut . I am fine .Es geht um Geld . it is about money .Geht es bei Ihnen am Montag ? is it possible for you on Monday ?Das Treffen geht bis 5 Uhr . the meeting is to five .

annehmen Wir sollten das Angebot annehmen . we should accept that offer .Ich nehme das Schlimmste an . I will take the worst .

vor Wir treffen uns vor dem Fruehstueck . we meet before the breakfast .Wir treffen uns vor dem Hotel . we will meet in front of the hotel .

The translation of ‘Ich nehme das Schlimmste an .’ as ‘I will take the worst .’shows the problem of long distance dependencies. In this case the strong connec-

14

tion between the words ‘nehme’ and ‘an’ was not captured by the alignment tem-plates. This can be improved with additional morphosyntactic preprocessing whichtransforms ‘nehme ... an’ into ‘annehme’. Training and testing on corpora with thispreprocessing will produce the translation ‘I suppose the worst’ .

Note also, that ‘Wir treffen uns’ in the last two sentences gets two different trans-lation, both of which are correct. This demonstrates the fact that not segments of thesentences are translated and the result concatenated. Rather, the target sentence isthe result of an overall search process combining different knowledge sources.

Examples from test corpus In Table 5 we give some translation examples are takenfrom the test corpus used for our internal evaluation. Translations were produced ontext and on speech input.

Table 5. Example from test-147 corpus.

Text wie w”are es denn mit dem achtzehnten , weil ich am siebzehnten nochverhindert bin .how about the eighteenth , because I am still booked on the seventeenth .

Speech wie w”are es denn mit dem achtzehnten , weil ich am siebzehnten nochverhindert , dannhow about the eighteenth , because I still booked on the seventeenth then .

Text sehr gut , ja . dann fahren wir da los . alles klar . danke sch”on .very good , yes . then we will go then leave . all right . thank you .

Speech sehr gut , ja ich dann fahren wir da uns , alles klar dann schonvery good , well then we will go then I us , all right then already .

Text Mittwoch , den sechsten , geht nicht . ”ah Montag , der elfte .Wednesday , the sixth , isn’t possible . ”ah Monday , the eleventh .

Speech wie Mittwoch den sechsten geht , nicht , Montag , der elfte ?how is , not Wednesday the sixth , Monday , the eleventh ?

Text ah , ja , ja , die haben einen guten Service .uh , well , well , they have a good service .

Speech ah , ja , die ja guten Service .oh , yes , good yes the service .

Text genau , das w”are dann eine ”Ubernachtung .exactly , then , that would be an overnight stay .

Speech genau , das w”are dann eine ”Ubernachtung .exactly , then , that would be an overnight stay .

Examples from End-to-End Evaluation Dialogs. During March and April 2000an end-to-end evaluation of the Verbmobil system was performed by the Verbmobilpartners at the university of Hamburg (Tessiore and Hahn, 2000). In this evaluationtwo native speaker conducted a dialog having no direct contact but only listeningto the translation synthesized by the Verbmobil system. From the Log-files of these

15

dialogs the best hypothesis of the speech recognizer for each turn and the transla-tion produced by the stattrans-module was extracted. In Table 5.4 some turns fromdialog 000412-152011 are shown. First the transliteration of the spoken sentence,next the output of the speach recognizer, and finally the translation produced by thestattrans-module .

Table 6. Translation examples from the end-to-end evaluation.

Guten Tag , fahren wir mit dem Auto nach Hannover ?guten Tag fahren wir mit dem Auto nach Hannoverhello , let’s go by car , to Hanover ,Okay , that’s fine. What time should we leave on the fourth of December ?I okay that is fine what time should we leave on the fourth of Decemberich , okay , das ist in Ordnung , um wieviel Uhr , sollten wir losfliegen ,am vierten Dezember ,Wir sollten um zehn Uhr morgens losfahren , damit wir um zwolf Uhrin Hannover sind .wir sollten um zehn Uhr morgens losfahren damit wir um zwolf Uhrin Hannover sindwe should at ten o’clock in the morning , so that we leave at twelve o’clock ,will be in Hanover .Okay , that’s fine . Let’s meet at ten o’clock in the morning .okay that is fine let us meet at ten o’clock in the morningokay , das ist in Ordnung , dann treffen wir uns , um zehn Uhr , in der Fr”uh ,Soll ich ein Hotel buchen ?soll ich ein Hotel buchenshall I book a hotel ,Okay , I like the Queens-hotel , could you reserve two single roomswith the shower ?I okay I I like the UNK:Hotel.kvi:nsho:tEl.Queens-Hotel could you reservetwo single rooms what the UNK:Hotel.Sa:vOY.Savoyich , okay , ich mag den , k”onnten Sie UNK:Hotel.kvi:nsho:tEl.Queens-Hotelzwei Einzelzimmer reservieren wie das UNK:Hotel.Sa:vOY.Savoy .Ich buche in dem Koenigshotel zwei Einzelzimmer mit Dusche .ich buche in dem UNK:Surname.k2:.KOEH zwei Einzelzimmer mit DuscheI will book in the UNK:Surname.k2:.KOEH two single room with shower .That’s fine . What is the name of the hotel ?that is fine what is the name of that hoteldas ist in Ordnung , wie ist der Name dieses Hotel .

16

6 Summary

References

Alshawi, H., Bangalore, S., and Douglas, S. (1998). Automatic Acquisition of Hierarchi-cal Transduction Models for Machine Translation. In Proc. 36th Annual Conference ofthe Association for Computational Linguistics and the 17th International Conference onComputational Linguistics, 41–47.

Baum, L. (1972). An Inequality and Associated Maximization Technique in Statistical Esti-mation for Probabilistic Functions of Markov Processes. Inequalities 3:1–8.

Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., and Mercer, R. L. (1993). The mathemat-ics of statistical machine translation: Parameter estimation. Computational Linguistics19(2):263–311.

Dagan, I., Church, K., and Gale., W. A. (1993). Robust bilingual word alignment for machineaided translation. In Proceedings of the Workshop on Very Large Corpora, 1–8.

Fung, P., and Church, K. W. (1994). K-vec: A New Approch for Aligning Parallel Texts. InCOLING ’94: The 15th Int. Conf. on Computational Linguistics.

Jelinek, F. (1976). Continuous Speech Recognition by Statistical Methods. In Proc. of theIEEE, Vol. 64, No. 10, 532–556.

Kay, M., and Roscheisen, M. (1993). Text-translation alignment. Computational Linguistics19(1):121–142.

Melamed, I. D. (1998). Manual annotation of translational equivalence: The blinker project.Technical Report 98-07, IRCS.

Nießen, S., Vogel, S., Ney, H., and Tillmann, C. (1998). A DP based search algorithmfor statistical machine translation. In Proc. 36th Annual Conference of the Associationfor Computational Linguistics and the 17th International Conference on ComputationalLinguistics, 960–967.

Nießen, S., Och, F. J., Leusch, G., and Ney, H.. 2000. An evaluation tool for machinetranslation: Fast evaluation for MT research. In Proceedings of LREC, Athens, Greece,May .

Och, F. J. (1999). An efficient method to determine bilingual word classes. In EACL ’99:Ninth Conf. of the Europ. Chapter of the Association for Computational Linguistics.

Tessiore, L., and Hahn, W. v. (2000). Functional End-to-End Evaluation of an MT System:Verbmobil. In this volume.

Vogel, S., Ney, H., and Tillmann, C. (1996). HMM-based word alignment in statisticaltranslation. In COLING ’96: The 16th Int. Conf. on Computational Linguistics, 836–841.

Vogel, S., Nießen, S., and Ney H. Automatic extrapolation of human assessment of transla-tion quality. In Proceedings of the Workshop on the Evaluation of Machine Translation,Athens, Greece, May 2000.

17