arXiv:2111.05988v1 [cs.IR] 10 Nov 2021

45
arXiv:2111.05988v1 [cs.IR] 10 Nov 2021 Cross-language Information Retrieval March 1, 2021 Petra Galuˇ akov´a, 1 Douglas W. Oard, 1,2 Suraj Nair, 1,3 1 UMIACS, 2 College of Information Studies, 3 Computer Science Department University of Maryland, College Park [email protected] Abstract. Two key assumptions shape the usual view of ranked re- trieval: (1) that the searcher can choose words for their query that might appear in the documents that they wish to see, and (2) that ranking re- trieved documents will suffice because the searcher will be able to recog- nize those which they wished to find. When the documents to be searched are in a language not known by the searcher, neither assumption is true. In such cases, Cross-Language Information Retrieval (CLIR) is needed. This chapter reviews the state of the art for cross-language information retrieval and outlines some open research questions. Keywords: Cross-Language Information Retrieval · Machine Transla- tion · Evaluation. 1 Introduction The goal of Cross-Language Information Retrieval (CLIR) 1 is to build search engines that use a query expressed in one language (e.g., English) to find content that is expressed in some other language (e.g., French). CLIR and Machine Translation (MT, automatically rendering the content of a document in another language) are closely linked in another way; what we call MT is the use of translation technology to render documents readable, whereas CLIR is the use of translation technology to render documents searchable. So CLIR and MT are both built on the same foundation: translation technology . While our focus in this survey is on retrieval of documents in one language using queries in another language, in practical applications the collection to be searched can potentially contain documents in many different languages, or even documents that themselves contain mixed languages, and the query too may contain terms from different languages. 2 1 The astute reader might ask why the field is not referred to as “cross-lingual” in- formation retrieval. At the start of the second wave, participants in the SIGIR 1996 workshop voted on a name for the field, and cross-language won. Language is, how- ever, created through use, and cross-lingual is today also a commonly used form. 2 Sometimes such cases have been referred to as Multilingual Information Retrieval (MLIR) [139], although that terminology is inconsistently used. For example, many authors have used MLIR to refer to information retrieval systems such as Google

Transcript of arXiv:2111.05988v1 [cs.IR] 10 Nov 2021

arX

iv:2

111.

0598

8v1

[cs

.IR

] 1

0 N

ov 2

021

Cross-language Information RetrievalMarch 1, 2021

Petra Galuscakova,1 Douglas W. Oard,1,2 Suraj Nair,1,3

1UMIACS, 2College of Information Studies, 3Computer Science DepartmentUniversity of Maryland, College Park

[email protected]

Abstract. Two key assumptions shape the usual view of ranked re-trieval: (1) that the searcher can choose words for their query that mightappear in the documents that they wish to see, and (2) that ranking re-trieved documents will suffice because the searcher will be able to recog-nize those which they wished to find. When the documents to be searchedare in a language not known by the searcher, neither assumption is true.In such cases, Cross-Language Information Retrieval (CLIR) is needed.This chapter reviews the state of the art for cross-language informationretrieval and outlines some open research questions.

Keywords: Cross-Language Information Retrieval · Machine Transla-tion · Evaluation.

1 Introduction

The goal of Cross-Language Information Retrieval (CLIR)1 is to build searchengines that use a query expressed in one language (e.g., English) to find contentthat is expressed in some other language (e.g., French). CLIR and MachineTranslation (MT, automatically rendering the content of a document in anotherlanguage) are closely linked in another way; what we call MT is the use oftranslation technology to render documents readable, whereas CLIR is the useof translation technology to render documents searchable. So CLIR and MTare both built on the same foundation: translation technology. While our focusin this survey is on retrieval of documents in one language using queries inanother language, in practical applications the collection to be searched canpotentially contain documents in many different languages, or even documentsthat themselves contain mixed languages, and the query too may contain termsfrom different languages.2

1 The astute reader might ask why the field is not referred to as “cross-lingual” in-formation retrieval. At the start of the second wave, participants in the SIGIR 1996workshop voted on a name for the field, and cross-language won. Language is, how-ever, created through use, and cross-lingual is today also a commonly used form.

2 Sometimes such cases have been referred to as Multilingual Information Retrieval(MLIR) [139], although that terminology is inconsistently used. For example, manyauthors have used MLIR to refer to information retrieval systems such as Google

2 P. Galuscakova et al.

1.1 Some Use Cases

The use cases of CLIR systems can be divided into two groups: (1) cases in whichthe user doesn’t speak the language of the documents, and (2) cases in whichthe user understands the documents but prefers to use a different language fortheir query. These use cases, together with other factors such as relatedness ofthe languages, frequency of their interaction, collection size and format, or theexpected number of queries then influence the design choices which need to beconsidered when building the CLIR system. Here we describe several scenariosin which the CLIR systems can be useful.

– In Web search, the natural emergence of one or more “lingua franca” lan-guages such as Chinese or English results in an asymmetry in which much ofthe information is in some small number of languages, but many users preferto (or must) interact with the Web in some other language. In such cases,CLIR can provide access to information that would otherwise be unavailable.

– Recall-oriented CLIR applications, which include searching in medical docu-ments [147, 75, 136, 69, 77, 156], patents [106] and security applications [30].The end users of the systems are often professionals for whom finding alldocuments on the defined topic is a high priority. It can be expected thatthe end users in such settings might have the ability to pay for high-qualityhuman translation of the relevant documents. Even in such cases, however,automatic translations might provide a fast overview of the search resultsand they might also be helpful for deciding if individual documents requirehuman translation. In such cases, a summary of a document in the languageof the query might be useful.

– CLIR applications can be expected to be particularly important in regionswhere multiple languages are frequently used, such as in countries thathave more than one official language. This might include English/Frenchin Canada, Dutch/French/German in Belgium, Spanish/Catalan/Basque inSpain, 24 official languages in the European Union, or the 23 official lan-guages in India. It can be expected that substantial amounts of parallel textwould often be available in such settings, either through commercial practiceor to meet legal requirements. Users of CLIR systems in such settings mightbe able to understand multiple languages, but they may be more fluent in,or more comfortable using, one language.

– Question answering (QA) is a special case of Information Retrieval (IR) inwhich rather than simply identifying relevant documents to the user, thesystem seeks to use information from relevant documents to provide a directanswer to a question. Cross-Language Question Answering (CLQA) has re-ceived considerable attention among researchers [56, 104, 33, 149] and withthe recent attention to conversational search and chatbots it might be ex-pected that this topic could attract continued attention in the coming years.

that are able to handle many languages, even when no support for cross-languagesearch is provided.

Cross-language Information Retrieval March 1, 2021 3

1.2 The Three Waves of Information Retrieval

We live today in the third wave of research on automated approaches to In-formation Retrieval (IR). In the first wave, documents were typically describedby controlled vocabulary descriptors (which is one example of what we todaycall metadata) and queries were expressed using the same descriptors. These de-scriptors were typically organized hierarchically (e.g., aircraft is a narrower termthan vehicle) in a thesaurus, and cross-language search could thus be supportedby adding entry vocabulary in other languages (e.g., by linking the French freetext term avion to the controlled vocabulary term aircraft) [131]. The drivingtechnology of that first wave was disk storage, and the driving need was to indexlarge collections of published materials in libraries, and large collections of un-published technical documents; the American Documentation Institute was anearly venue for research on thesaurus-based access to large collections.3

In second wave IR (1990-today), documents are typically represented usingthe terms they contain (i.e., full-text indexing), searchers are no longer limitedto thesaurus terms in their queries and instead can use any terms they wish, andretrieved documents are ranked based on the degree to which they are estimatedby the system to satisfy the query. A driving force behind the emergence of thesecond wave was the need to provide search engines for the World-Wide Web,and the ACM Special Interest Group on Information Retrieval (SIGIR) emergedas a focal community for that work.

In second-wave IR research, the usual approach was to represent queriesand documents similarly, and then to compare the queries to the documents.Examples of this included the vector space model and the query likelihood model.Recently, however, a third wave has emerged, with different representations forqueries and for documents. The defining technology for this new approach is theneural transformer architecture, through which query and document terms learnto attend to each other.

Each of these waves of IR research has shaped a generation of CLIR research.Our goal in this chapter is to present a concise view of CLIR as seen from thethird decade of the twenty-first century. It is not possible to review every de-tail, but neither is that necessary since, several useful overviews already exist.Particularly notable in that regard is Nie’s excellent survey, penned a decadeago [125]. Here, therefore, we pay particular attention to influential work pub-lished since Nie’s 2010 survey. Along the way, our presentation supplements fourother surveys that have been published since Nie’s survey [40, 73, 199, 203].

2 The Core Technology of CLIR

CLIR research came of age in the second wave, when the key question washow queries and documents that were expressed using different languages couldbe represented similarly. Because that work still shapes our view of CLIR, we

3 Today’s successor to the American Documentation Institute is the Association forInformation Science and Technology (ASIS&T).

4 P. Galuscakova et al.

first introduce how that was done. We organize our story around three majorquestions: (1) ‘what to translate?’, (2) ‘which translations are possible?’, and (3)‘how should those translations be used?’

2.1 What to Translate?

For CLIR, the question of what to translate exists at two levels. First we canask which items should be translated—the queries or the documents? With thatquestion answered, we must then ask how those documents or queries can bebroken up into units (which we call terms) that can be translated. First thingsfirst, we start with the question of whether it is the query or the documents thatshould be translated.

The Query or The Documents?

Perhaps it seems obvious that it is the query that should be translated,since queries are short and documents are long. Indeed, query translation iswidely used in CLIR experiments, specifically because it is efficient. But efficiencyconsiderations may come out differently when the query workload is very high,as is the case in Web search.4 On the other hand, query translation has clearadvantages when the goal is to cull through streaming content for documentsthat satisfy a standing query. The point here is that the conventional wisdomthat query translation is more efficient may or may not be true—much dependson the details of the task.

However, document translation has another potential advantage beyond the(relatively rare) cases in which it is the more efficient choice. The words in a doc-ument occur in sequence, and we have good computational models for leveragingsuch sequences to make more accurate translations. Of course, some queries in-clude word sequences as well, but word sequences in queries are often very short,and thus possibly less informative. Modern translation models are trained usingpaired sentences from paired documents, and pairings of documents with theirtranslations are much more widely available than examples of queries pairedwith their translations. If people wrote queries like they wrote documents, thiswould not be a problem. But they don’t. As a result, even when query translationwould be more efficient, document translation might be more accurate.

One further factor to consider when choosing between query translationand document translation is which direction yields the simplest index struc-ture. Query translation has advantages for applications in which there are manypossible query languages, but only one document language. In such cases, spaceefficiency argues for indexing in the document language, and thus translatingthe queries. Symmetrically, document translation has advantages when all thequeries are all in one language, but there are many document languages to be

4 The indexed Web contains 5.6 billion pages, but Google alone receives 5.6 billionqueries per day. At that rate, it doesn’t take many days for the number of querywords to become larger than the number of indexed document words.

Cross-language Information Retrieval March 1, 2021 5

searched. In such cases, space efficiency argues for indexing in the one querylanguage.

Translation direction might moreover influence the translation quality. Trans-lating into the morphologically rich languages usually performs worse than trans-lating into languages with simpler morphology, such as English. Translating fromthe language with no given word boundaries such as Chinese an Japanese mightbe harder than translating into such language. Translation direction thus mightalso have an effect on subsequent CLIR application. Another obvious advantageto document translation is that if the translations produced at indexing time arehuman-readable (not all translations are) then readers who are unable to readretrieved documents in their original language can be shown cached translations.So the picture is complex, with several factors to be considered when choosingbetween query translation and document translation. But those are not the onlytwo choices—there are two others.

The third choice is to translate both the queries and the documents. Forexample, if we want to use Navajo queries to search Burmese content we mighttranslate both the Navajo and Burmese into English, simply because there aremore language resources for Navajo-English and for Burmese-English than thereare for Navajo-Burmese. If we broaden our concept of translation to includeany mapping of an item (e.g., a word or a character sequence) from a repre-sentation specific to one language to some representation that is not specificto that language, then we can think of bilingual embedding (e.g., [98]) as aform of translation. With bilingual or multilingual embeddings, addressed in de-tail in Section 3.2, we map from language-specific representations (in this case,sparse vectors that indicate the presence of specific terms in a specific language)to distributed representations (i.e., dense vectors) that can be thought of aslanguage-independent representations of meaning. Here we have on some sensetranslated the query and the documents; we have just translated them into anabstract representation rather than a representation associated with any singlelanguage.

As long as we’re being exhaustive here, we should note that doing no transla-tion at all is the fourth possibility. When character sets are the same, sometimesproper names or loan words will match. Although this is almost never a goodidea on its own (with the possible exception of very close dialect differences suchas British and American English), the presence of such felicitous “translations”can serve as useful anchor points around which more sophisticated and capa-ble approaches can be constructed. For example, McNamee and Mayfield usedpre-translation blind relevance feedback to enrich queries with such terms, thusproducing a surprisingly good baseline for some language pairs to which othertechniques could be compared [115].

Finally, we should note that when suggesting translating the query or trans-lating the documents as possibilities, one need not limit themselves to just trans-lating once. Translation activity follows cultural contact patterns, and transla-tion systems are most easily built for the language pairs in which there alreadyis substantial translation activity. For language pairs for which there is little

6 P. Galuscakova et al.

cultural contact (e.g., Navajo and Burmese, as in our earlier example), it canbe useful to translate through a “pivot” language (in this case, perhaps En-glish) that has cultural contact with both. For example, to use Navajo queriesto search Burmese content with query translation, we might first translate thequery from Navajo to English and then from English to Burmese. As Ballesterosand Sanderson [12] has shown, applying query expansion each time translationis performed can serve to limit the adverse effect of cascading errors.

Which Terms?

When we think of translation, we often think about translating words. Butthe concept of a word is actually quite slippery. Is poet a word? If so, is poets adifferent word, or is it something else—a plural form of a word? If we think ofwords as abstract objects that have forms then poet’s and poets’ and even poetare yet more forms of the same word, and poet is its root form. But informationretrieval people have a much simpler view of the world. We don’t call what weuse in our searches words, but rather terms,5 And a term is simply whatever wechoose to index in an IR system. This might be a token (in which case we needto say what choices our tokenizer makes!), it might be a stem (the part of theword which is shared across the inflected variants of this word) [146], or it mightbe simply a character sequence (what we typically call character n-grams).

Many other such choices are also made when choosing terms. For example,terms can be in their original case or lowercased, diacritic marks can be retainedor removed; substitutable characters can be retained as-is or normalized, andstopwords can be removed or retained. The choices made on each of these pointsoften differ from those used when human-readable one-best translation is thegoal because information retrieval tasks (and thus information retrieval evalua-tion measures) often place greater emphasis on recall (i.e., on minimizing falsenegatives) than do one-best machine translation tasks.

None of this is specific to CLIR; these same issues arise in any IR application.What is specific to CLIR, however, is that these choices affect not just the generalretrieval approach, but also the translation functions that are indeed specific toCLIR. Because in the simplest approaches to CLIR, when we translate, we simplytranslate terms in one language into terms in the other language. We can payattention to word order (which often differs between languages) if our retrievalsystem leverages proximity or word order (or both), but often it may sufficeto simply get the right terms in the other language. So, for example, we cantranslate from n-grams to n-grams [114].

It is worth noting that we have used tokens, stems and n-grams as examples,but there’s more to making terms than just those three options. For example,to our way of thinking in information retrieval, the statistical phrases used inStatistical Machine Translation (SMT) are also terms. So in CLIR, terms arenot just what you index; they are also what you translate. Note, however, that

5 The one exception is the so-called “bag of words” representation, which is of coursein actuality a bag (in the set theory context) of terms.

Cross-language Information Retrieval March 1, 2021 7

this introduces a tension, since longer terms limit translation ambiguity, whereasshorter terms offer more scope for improving recall.

2.2 Which Translations are Possible?

Once we choose which terms we need to be able to translate, our next questionsare which translations are possible for each term, and (extending that further)how likely is it that each possible translation reflects the intended meaning?There are at least four kinds of places we might look for the answer to thatquestion:

1. We might look translations up in some manually built lexicon. In CLIR,this is referred to as dictionary-based translation (regardless of whether thelexicon was actually created from a human-readable dictionary). Most often,this technique is applied to queries, and thus referred to as Dictionary-BasedQuery Translation (DBQT). In practice, there are many kinds of manuallybuilt lexicons that might be mined; among the most common are bilingualphrase lists, bilingual dictionaries (which contain not just term mappingsbut also definitions, parts of speech, and examples of usage), multilingualthesauri (thus reusing technology from the first wave), multilingual ontolo-gies (e.g., page titles in multiple languages mined from Wikipedia). Usingactual bilingual dictionaries would make it possible to use part of speechtags as constraints, but accurate part of speech tags can be hard to generateautomatically for short queries so this constraint has proven to be of lessuse than might be expected. Some lexicons order alternative translations indecreasing order of general use, and that ordering can be exploited to avoidoverweighting uncommon translations.

2. We might estimate translation probabilities from observed language use inone of three ways:– By far the most successful way of doing this in CLIR has been to mine

parallel text (i.e., translation-equivalent passages; usually sentences) tolearn translation mappings that include a probability for each transla-tion. This can be done explicitly using Statistical Machine Translation(SMT) models [189] or Neural Machine Translation (NMT) models [198],or it can be done implicitly using a complete SMT or NMT system thatproduces 1-best or n-best results. Of course, the relative prevalence oftranslations in the parallel text that is used to learn the probabilitiesmust be similar to that expected in the collection to be searched; learn-ing from translations of tractor manuals might not be wise when the goalis to search poetry.

– An alternative source would be comparable corpora (i.e., collections ofdocuments on similar topics that are not translations of each other),which can be used to learn plausible translations. When links betweenpairs of documents on the same language are available, these mappingscan either be learned explicitly [165] or implicitly [100, 150]. Indeed, itis even possible to learn implicit mappings with no links between docu-ment pairs at all by leveraging regularities in usage that are preserved

8 P. Galuscakova et al.

across languages [25]. These comparable corpus techniques are limitedby corpus size, however, since orders of magnitude larger corpora areneeded for comparable than for parallel corpora to get similarly goodresults for relatively rare words. This is important because in many set-tings relatively rare words are relatively common in information retrievalqueries.

– Finally, it is also possible to mine collections of mixed-language docu-ments for translations. One approach to this is to focus on inline defini-tions that explain the meaning of a term in one language using terms inanother language [93]. Such an approach to writing is used to introducenew technical terms in some settings.

3. We might also generate (or implicitly recognize) “translations” using an or-thographic or phonetic transcription algorithm. Buckley, writing somewhattongue in cheek, famously observed that for purposes of information re-trieval, French can be thought of as misspelled English and ‘corrected’ usinga spelling correction algorithm [22]. More generally, transliteration can beperformed based on spelling (essentially, character-level translation) or pro-nunciation, it can be generative (forward transliteration), inferential (back-ward transliteration), or implicit (cognate matching) [24]. Transliteration isparticularly useful for proper names, but it is a potentially useful fallbackany time translations are not available from lexicons or from corpus-basedmethods.

4. Finally, the original source of all translation knowledge is human intelligence,so when all else fails we might turn to the user for help. For example, aninteractive system might leverage limited evidence of possible translationsto engage the searcher in a dialog regarding how best to translate a queryterm [134], or systems might show searchers snippets that include untrans-latable terms in the hope that the searcher would be able to interpret theterm in context.

2.3 How to Use those Translations?

Much of the early work on dictionary-based query translation involved simplysubstituting one document-language term for each query-language term. Sincemany terms are homonymous,6 by which is meant that they have more than onequite different meaning, it naturally led to the question of which translation(s)should be used. The usual approach was to look to other terms in the query in aneffort to disambiguate the intended meaning of each query term. However, oneterm might have both distantly and closely related translations (e.g. the Spanishword banco can be translated into English as bank, bench, or pew), so a betterquestion (at least when more than one of the translations is plausible) would be

6 People often say polysemous when they mean to refer to different meanings forterms that are written identically, but when speaking precisely polysemy refers toclosely related meanings—often not a problem in IR—whereas homonymy refers tounrelated meanings.

Cross-language Information Retrieval March 1, 2021 9

which translations should be used. That naturally leads to the question of howseveral possible translations for a single term should be used, and that turns outto be the right question in general (since a single known translation is simply aspecial case of that more general question).

Initial attempts at using multiple translations, such as stuffing all of thetranslations into one very long query, proved to be problematic. Indeed, evencareful attention to the balance of weights between query terms did not workparticularly well. The fundamental problem was that ranking functions tend togive more weight to query terms that are rare, so it was the rarest translationsthat were dominating the results—exactly the opposite of what you would want.Still today it is not uncommon to see unbalanced query translation inappropri-ately used as a baseline in CLIR experiments. A better approach, based on closercoupling of translation and retrieval, was ultimately introduced by Pirkola [144].In Pirkola’s Structured Query (SQ) method, the key idea was to think of trans-lation as synonymy: every time any synonym (here, any translation) of a queryterm was seen in a document, the count for the query term in that documentwas incremented. With this insight in hand, it was a simple matter to use thesynonym operator in the Inquery information retrival system, a generalizationof Inquery’s query-time stemming, to perform CLIR.

Meanwhile, a parallel line of work on CLIR had developed in which peoplewere simply cascading Machine Translation (MT) and IR to create CLIR sys-tems. This too usually just substituted one word for another, but the MT systemnowmade the choice of what that single replacement should be. CLIR researcherssoon realized that more could be gained by using the internal representationsof the MT system, however, to identify synonyms. Thus dictionary-based andMT-based CLIR researchers were on convergent paths.

The unification of CLIR and MT that occurred in the late 1990’s led to a fur-ther extension to the structured query method, asking not just which translationswere possible, but how likely each document term was to have been translatedinto a query term. Note the direction of the conditional probability here; we areinterested in the probability that a query term is a translation of a documentterm, not the other way around. We can then ask, for every document, whatwould the expected counts of the query terms have been if the document hadbeen written in the query language. If ei,j is the expected count of query termei in document j and fk,j is the actual count of document term fk in documentj then we get:

ei,j =∑

k∈j

p(ei|fk)fk,j , (1)

where p(ei|fj) is the probability that document term fk (e.g., banco) wouldtranslate to query term ei (e.g., pew). This probability might be estimated inmany ways, but by far the most popular approach when parallel (i.e., translation-equivalent) text is available has been to perform term-level alignment and thencompute the maximum likelihood estimate for the probabilities (i.e., the ap-proach known in machine translation as IBM model 1) [113].

10 P. Galuscakova et al.

This approach to using estimated translation probabilities was first triedwith document ranking functions based on unigram language models, but theformulation is general and can be applied with any ranking function. In generalranking functions in information retrieval are based on three elements of a term-by-document matrix: (1) an element statistic based on the number of occurrencesof a term in a document (for representing the“aboutness” of a document), (2)a row statistic based on the number of documents in which a term occurs (forrepresenting the specificity of a term), and (3) a column statistic based on thenumber of terms in each document (that can used to compute a density from theelement statistic). For brevity, these are typically called term frequency (TF),inverse document frequency (IDF), and length, respectively. Although it is usefulto remember that in practice what is often meant are not raw counts but rathermonotonic functions of those counts.

With that in mind, it is easy to see how the same approach can be used toestimate document frequency. Defining k as the number of distinct terms in thedocument language, |fk| as the number of documents containing term fk, and|ei| as the expected number of documents in which query term ei would haveappeared (if the documents had been written in the query language), we get:

|ei| =∑

k

p(ei|fk)|fk| (2)

This formula actually overestimates |ei| somewhat (because two terms thattranslate to ei might be found in the same document), but the error has beenshown experimentally to typically be inconsequential [35]. Alternatively, if a suit-ably large and representative side collection is available in the query language,|ei| can be estimated directly from that side collection. For length, it typicallysuffices to directly use the length as computed in the document language. It iswell known that different languages can use more or fewer words to express thesame content, but for documents of any reasonable length the scaling factor isfairly close to a constant, and most of the widely used ranking functions producerankings that are insensitive to constant factors. So adjustments to the documentlength are not typically required. Darwish and Oard called this approach of sepa-rately estimating query-language term frequency and query-language documentfrequency vectors from the observed document-language statistics

Probabilistic Structured Queries (PSQ), since it extends Pirkola’s SQ ap-proach by allowing partial mappings [35]. The approach can also be seen as anextension of an approach that language modeling researchers had earlier intro-duced for the case in which only term frequency statistics needed to be mappedacross languages [189].

PSQ is essentially vector translation; you start with either the TF vectoror the DF vector in the document language, multiply that vector by a transla-tion probability matrix, and gets a TF or DF vector in the query language. Atthat point, ranking proceeds as if the documents had been written in the querylanguage in the first place. The key point is that TF and DF vectors shouldbe translated separately; precombining those two factors (e.g., using TFIDF

Cross-language Information Retrieval March 1, 2021 11

or BM25 term weights) before translation does not work nearly as well.7 Theonly twist is that these “translated” TF or DF vectors are no longer vectorsof integer counts, but rather are now vectors of real-valued (i.e., partial) termcounts. This is typically associated with some spreading of the counts acrossmore terms, which has implications for both effectiveness and efficiency. Empir-ical results indicate that adverse affects on effectiveness from that spreading areoften balanced by gains in (average) effectiveness from the expansion effect ofthe translation mapping (which can usefully add synonyms). Adverse effects onefficiency can be limited by pruning the set of possible translations, althoughoverly aggressive pruning can of course adversely affect effectiveness.

This line of development left one key question unresolved: how translationshould be done when the ranking function makes use of term proximity, termorder, or both. The problem was that the initial approach to PSQ translatedonly decontextualized terms. The first to tackle that problem were Federico andBertoldi, leveraging the observation that translation can change word order, butthat long-distance reordering is less common [42]. He therefore experimentedwith a variant of what has subsequently come to be called the Sequential De-pendence (SD) model [117] in which both original and reversed word orders weretreated as equally informative.

Work on PSQ still continues, most recently by Rahimi et al. [145], who showthat further improvements can be obtained in some cases by balancing discrim-ination values (e.g., IDF) calculated in the query and the document languages.

3 Updating Nie: CLIR since 2010

If a CLIR counterpart to Rip Van Winkle were to have fallen asleep in 2010,only to awake in 2021, the thing that surely would most impress them is thetremendous energy around neural methods. Neural networks offer ways of learn-ing nonlinear models that have the potential to improve over simpler linear andhand-engineered nonlinear models [67]. Widespread adoption of rectified linearunits over the last decade has made it possible to train much deeper models, andthis in turn has led to a blossoming of new model architectures. Applications ofthese techniques to CLIR have faced three challenges: (1) Initial work on Neu-ral MT focused principally on one-best translation for human readability ratherthan on tuning for retrieval tasks, (2) neural retrieval models are data intensive,and present training data has largely focused on monolingual applications, and(3) neural methods are computationally intensive, thus placing a renewed pre-mium on efficiency. In this section we adopt key aspects of the structure of theChapter 2 to review advances in CLIR since 2010.

3.1 What to Translate: Parts of Words

As described above, systems for handling text as tokens must choose how to seg-ment that text into terms. This task has received particular attention among Ma-

7 The reason for this is that IDF is perhaps best thought of as a measure of theselectivity of a query term.

12 P. Galuscakova et al.

chine Translation (MT) and Automatic Speech Recognition (ASR) researchers,who have developed new representations that can also be useful for retrievaltasks. Byte-Pair Encoding (BPE) [164] and Wordpiece [163] are now widely usedin MT. Both these methods use subword units to deal with rare words, thus mit-igating the out of vocabulary problem. In BPE, subword splitting is first trainedon a collection so that a subword vocabulary of a pre-specified size is created.Wordpiece works similarly, but the decision about which subwords to generateis made based on the maximizing language model log likelihood rather thanword frequency as in BPE. Wordpiece is especially widely used in BERT models(see below). In contrast to these models, which operate on individual tokens,Sentencepiece [84] operates on the sentence level by first joining all the wordsin the sentence, and then re-segmenting that aggregated character sequence byinserting a special token at the segmentation points.

3.2 Which Translations: Cross-Lingual Embeddings

The third wave of CLIR (characterized by different representations for queriesand for documents) is built from three key innovations: (1) some way of creat-ing dense vector representations in which terms with similar meaning are repre-sented by similar vectors, (2) deep neural architectures that can be trained usingback-propagation, and (3) the learned self-attention of the neural transformerarchitecture. The first of these emerged initially in second-wave IR research, andthat is our focus here. Initial work on creating dense vector representations forterms dates to the introduction of Latent Semantic Indexing (LSI) in 1988 [47].At the time, linear algebra, specifically a truncated Singular Value Decomposi-tion (SVD) was used to compute a dense representation for each term (knownas singular vectors). Today, neural autoencoders use non-linear models to do thesame thing: compute a dense representation for each term. These dense repre-sentations capture an embedding of the original high-dimensional term space ina lower-dimensional vector space. Given a fixed vocabulary (e.g., from BPE),they can be trained in advance, thus reducing the term-level embedding processto a simple table lookup.

It did not take long for researchers to realize that bilingual embeddings couldbe created that assigned similar dense representations to terms with similarmeanings, regardless of their language. Initially these bilingual embeddings werealso constructed using linear algebra [101, 100]; today such representations canalso be learned using autoencoders [183, 16]. Three broad classes of techniqueshave been proposed for creating bilingual term embeddings: (1) pseudo-bilingual,(2) projection-based, and (3) unsupervised. For a detailed treatment of theseapproaches generally, we refer readers to a survey on this topic by Søgaard et al.[170]. Here we focus on those parts of the story that involve CLIR.

Landauer and Littman [87] were the first to propose the pseudo-bilingualapproach. The key idea in this approach is to create a set of mixed-languagedocuments from a comparable corpus using explicit links between the docu-ments (as discussed above in Section 2.2), and then to learn embeddings fromthat collection. Specifically, they concatenated English and French versions of

Cross-language Information Retrieval March 1, 2021 13

the same document, and then applied a truncated SVD to learn a dense repre-sentation for terms in both languages. Because the truncation tended to preservemeaning while suppressing the effect of term choice, the approach learned similarvectors for terms in different languages that had similar meaning. Twenty-fiveyears later, Vulic and Moens [183] did something similar using a neural autoen-coder and document-aligned comparable corpora from Wikipedia and Europarl.Instead of SVD, they trained a skip-gram [118] model on documents created byconcatenating the English document and the comparable document in Dutch,and then shuffled the terms. When used with an unusually wide window, thisallows the skip-gram model to take into account bilingual context to generatean embedding for each term. However, randomly shuffling the terms might leadto sub-optimal choice of bilingual context for certain terms. Improving over ran-dom shuffling, Bonab et al. [16] used a bilingual dictionary to guide the processof interleaving terms from the two languages so that terms that are putativetranslations of each other would be adjacent in merged sentences. This makes itlikely that a valid translation will appear in the neighborhood context of a term,which will be then used to learn better representation for both that term and itstranslation. In their work, they used sentence-aligned parallel text and mergedon a sentence-by-sentence basis.

Littman et al. [101] was the first to propose the projection-based approach forCLIR, where independently learned monolingual embeddings are aligned using alearned linear transformation such that known translations (e.g., from a biligualdictionary, as described in Section 2.2) yield similar representations. This ap-proach is now very widely used, typically being applied to align monolingualembeddings that were created using autoencoders. Because fairly large dictio-naries are now available for many language pairs, the projection-based approachis easily scaled to create multilingual embeddings that today can represent termsfrom more than 50 languages [4, 86, 70]. In recent work with embeddings learnedusing neural autoencoders, Bhattacharya et al. [14] used linear regression tolearn a transformation between Hindi and English monolingual embeddings us-ing Hindi-English translation dictionary. Similarly, Litschko et al. [98] mappedsource and target embeddding spaces into a single shared space by learning twoprojection matrices, one for each language, to create a new joint space thatis different from that used for either language [169]. In that work, they com-pare three ways of creating projection matrices: Canonical Correlation Analysis(CCA) [41], minimizing the Euclidean distance between the projected vector andthe target vector [119], and maximizing a retrieval-oriented measure they callCross-Domain Similarity Local Scaling (CSLS) [86, 70]. In multilingual CLIR,Bhattacharya et al. [13] used two different approaches to learn multilingual em-beddings, a projection-based method in which Bengali and Hindi were mappedto an English embedding space, and the pseudo-bilingual approach proposed byVulic and Moens [183], extended to multiple languages.

Recently, there has also been some work on unsupervised approaches to learn-ing bilingual embeddings using projection-based techniques [6, 86, 62]. While su-pervised approaches rely on sources such as bilingual dictionaries to learn align-

14 P. Galuscakova et al.

ments, unsupervised approaches instead follow a two-step approach: 1) build aninitial seed dictionary from monolingual vector spaces using some unsupervisedapproach, and 2) iteratively refine that dictionary to add more pairs that willeventually be used to learn better embeddings. Although there exist several un-supervised approaches to learn the initial dictionary, the key idea in the secondstep is to project one embedding space onto another and then to use some of thenear neighbors in the aligned spaces as the added dictionary entries. The induceddictionary can be iteratively refined using Procrustes Analysis [162] and the re-fined dictionary can then be used to learn better bilingual embeddings. Thisdoes, however, assume that the vector spaces are isomorphic, which need not betrue for languages that are typologically different, and this approach has beenshown not to work well in such cases [184]. Litschko et al. [98] and Litschko et al.[99] used these unsupervised approaches to perform fully unsupervised CLIR inlanguage pairs involving limited training resources. However, the effectiveness offully unsupervised approaches was found to be rather limited when compared tosupervised methods Vulic et al. [184].

One problem with embeddings learned using linear algebra or by autoen-coders is that homonomy distorts the representation, since the model has noway of knowing if this banco is a financial institution or a wooden bench ina park. The advent of custom neural architectures made it possible to producecontext-sensitive embeddings, with the embedding for banco differing, dependingon its surrounding terms [140, 36]. The resulting Bidirectional Encoder Repre-sentations from Transformers (BERT) architecture [36] has since been furtherextended (e.g. RoBERTa [103], XLNet [191], ELECTRA [27], XLM [85]), someof which support more than 100 languages. We should emphasize that this useof BERT and its derivatives to produce contextual embedings differs from thethird-wave use of BERT that we describe in more detail below for direct query-document matching.

All the techniques discussed so far for producing cross-lingual embeddingsproduce generic representations for queries and documents that need to be tunedfor the CLIR task. An alternative is to learn representations that are tailoredfor CLIR as done in monolingual retrieval [65, 196]. The advantage of these“representation-based” learning approaches is that the representations of thedocuments can be precomputed, and during retrieval the score on which rankingis based can be rapidly computed (more on this in Section 3.4). The key chal-lenge then lies in learning specialized representations of queries and documentsthat are useful for this task. Gupta et al. [61] learned CLIR-specific representa-tions for queries and documents using a transfer learning approach based on aparallel corpus. First, they learn a representation model similar to DSSM [65] byusing a monolingual retrieval collection in English to learn representations suchthat similarity computed from the representations of queries with the relevantdocuments are maximized, whereas similarity with non-relevant documents isminimized. Using the trained monolingual model, they extend it to the cross-language setting using an English-Hindi parallel corpus. To do so, the monolin-gual model was applied to the English side of the parallel text and a cross-lingual

Cross-language Information Retrieval March 1, 2021 15

model was trained on Hindi side. The goal was maximize the similarity of theEnglish representation computed using the monolingual model and the Hindirepresentation computed using the cross-lingual model for a given translationpair. In a somewhat different approach, Li and Cheng [91] employed an adver-sarial learning framework using a Generative Adversarial Network (GAN) [57]to learn a CLIR-specific representation by incorporating both monolingual andcross-lingual matching signals. The model takes as input three sources, rawqueries, documents, and the translated queries (in this case, translated usingGoogle Translate). They incorporated several constraints in their model to learntask-specific representations: 1) a monolingual constraint that brings the rep-resentation of documents and the translated queries closer, 2) a cross-languageconstraint that brings the representation of documents and raw queries closer,3) a translation constraint that brings the representation of raw and translatedqueries closer, 4) an adversarial constraint that produces source-agnostic repre-sentations.

3.3 Which Translations: MT

MT has advanced quite rapidly over the past decade, building upon the keyinnovations in distributed word representations and deep learning technologies.For high-resource language pairs, this has resulted in neural systems convincinglysurpassing the performance of the statistical MT systems that had dominatedMT research activity in the first decade of this century. This started with theadvent of attention-based neural methods [11] built on an underlying encoder-decoder architecture [124]. The key idea behind the attention mechanism is toleverage the fact that models can pay attention to different parts of the sourcesentence while decoding each target token. Later on, the introduction of subwordunits (described above in Section 3.1) addressed the problem of translating rarewords. Ultimately, these two factors, attention and subword units, proved to bethe key ingredients.

As mentioned above in Section 2.1, query translation is often a preferredapproach owing to its computational efficiency in experimental settings. How-ever, the key challenges lie in 1) overcoming the ambiguity introduced by shortword sequences in queries, and 2) mismatch between the style of writing for thequeries to be translated and the style of the parallel sentences used for trainingthe MT models (as described in Section 2.1). Prior work has tried to tackle thestyle mismatch issue by creating an in-domain dataset of queries paired withtheir translations and using it to train SMT/NMT systems. In the context ofCLIR, Nikoulina et al. [127] first explored this idea for building a SMT systemusing a dataset consisting of human translated queries provided by CLEF. Huet al. [64] built a small manually-curated in-domain dataset consisting of pop-ular search queries, augmenting it with large amounts of parallel out-of-domainsentences to train a NMT model. As a different approach, Yao et al. [192] uti-lized click-through data from a multi-lingual e-commerce website8 to mine query

8 https://www.aliexpress.com/

16 P. Galuscakova et al.

translation pairs. Another way to minimize the style mismatch effect is to forcethe MT model to to incorporate terms from the retrieval collection in the trans-lation of the query. This has been explored in SMT by using the target retrievalcollection to train a language model, which helps in translating the query in away that better matched the documents [126].

Similarly, Bi et al. [15] trained a neural model by constraining the translationcandidates of the query to the set of document terms ranked highly by TFIDF.The set of documents used for this was selected using click-through data foreach query. In a somewhat different approach, Sarwar et al. [157] tried to solvethe style mismatch problem using a multi-task neural model that optimized forimproving the query translation jointly with a relevance-based auxiliary task.The goal of the auxiliary task was to restrict the NMT decoder to producetranslations using the terms that are equally likely in both the retrieval collectionand the MT training corpus. The auxiliary task involved learning a relevance-based embedding model on the content of the top-ranked document from theretrieval corpus, using the target language sentence from the parallel corpus asthe query. Since these word embeddings are shared across the NMT and relevancemodels, that forces the NMT decoder to choose translations that occur in boththe retrieval collection and the parallel corpus.

Document translation, on the other hand, has the advantage of the additionalcontext present in the longer documents. Attention-based encoder-decoder NMTmodels leverage the context present in both the source and the target side toproduce context-dependent translations, in contrast to dictionary-based or PSQtechniques that rely on context-independent term translation. However, owingto the computational overhead involved in translating the entire document col-lection (as described above in Section 2.1), there has been less work exploringneural models for document translation than there has been for query transla-tion. However, full encoder-decoder models are only one point in a larger designspace. Relaxing the contextual constraint to apply only to the document beingtranslated, and not to the sequence of terms that are generated as possible trans-lations, leads to an approach for generating multile translation hypotheses foreach term called the Neural Network Lexical Translation Model (NNLTM) [198].Applying constrains only on the document side implies that the translation(s)of the document terms are assumed to be independent of each other. While thiswould affect the fluency of translations intended for human readability, the prin-cipal goal of a translation model for CLIR is to generate alternative translations,and not necessarily to properly order all those translations. This is very similar tothe case of PSQ relying on the word alignments generated from the SMT systemrather than using the language model to help select and order those translations.For each document term, NNLTM uses the neighboring terms within a windowto produce the contextualized translation alternatives. In order to do so, whiletraining, NNLTM relies on the alignments from a parallel corpora generated us-ing word aligners and uses aligned term in the document language along withthe context words surrounding it term to predict the translation in the querylanguage.

Cross-language Information Retrieval March 1, 2021 17

Ignoring other qualities of query or document translation described in sec-tion 2.1, such as speed, a natural question is to ask which approach is mosteffective. Contrary to our expectation, recent work from Saleh and Pecina [156]found that translating queries actually works better than translating documentswhile retrieving texts from medical domain. However, the queries were translatedfrom European languages to English, compared to the documents which weretranslated from English to European languages, which is generally consideredto be a harder problem. McCarley [113] also tried to answer this question andfound that neither approach was the clear winner.

If the quality of MT and CLIR are interconnected, the subsequent questionis if and how can machine translation system be tuned in order to achieve betterCLIR performance. One of the commonly used approaches to integrate retrievalsignals with MT is to (re)-rank the translation alternatives produced by MTsystem in order to maximize a retrieval objective. This was first attempted byNikoulina et al. [127], who tuned their SMT system to produce translationsthat directly optimize the Mean Average Precision (MAP) metric. Since then,Sokolov et al. [171] have focused on optimizing SMT decoder weights to maximizea retrieval objective that leads to producing translations geared towards betterretrieval. Saleh and Pecina [154] have since built a feature-based reranking modelthat reranks different alternatives produced by the MT system and selects thetranslation that maximizes a retrieval objective.

3.4 Using Translations: Neural IR

Now that we have different ways of generating bilingual embeddings that are ei-ther generic or task-specific as discussed in Section 3.2, the next question to askis how to use them for CLIR. One possibility, which we have already mentioned,is to use embedding similarity between query and document representations. Oneway to create such representations is to create an aggregated query vector fromthe embeddings of individual query terms and an aggregated document vectorfrom the embeddings of the individual document terms. Some similarity mea-sure such as the cosine can then be used to compute a retrieval score for eachdocument, with documents sorted in decreasing score order. Vulic and Moens[183] explored different forms of aggregation, including unweighted and weightedsummation of word embeddings, using IDF as weights for the weighted summa-tion. They found that IDF-weighted summation worked better than unweightedsummation.

A second approach for incorporating embeddings into a retrieval system is touse similarity in a bilingual embedding space of terms from different languagesas a basis for term translation. We refer to this approach as embedding-basedtranslation. For example, Bhattacharya et al. [14] used the k most similar termsfrom the document language as translations of the query term. This is similar toquery expansion, but in a cross-language setting. In a similar approach, Litschkoet al. [98] introduced a model that they called Term-by-Term Query Translation(Tbt-QT) which, as the name suggests, uses embedding-based translation of eachterm. Specifically, they used a single cross-language nearest neighbor for every

18 P. Galuscakova et al.

query term to do retrieval. As might be expected from the experience with pre-neural CLIR methods, this turns out to be sub-optimal when compared withusing more than one nearest neighbor [16].

A third approach, which has now been well studied in monolingual IR, usesembeddings as input to custom neural architectures that are optimized for rel-evance which enable to perform full-collection neural ranking. In one prominentline of work on monolingual information retrieval known as “interaction-based,”query and document terms are encoded using a single neural network initial-ized with Word2Vec [118] monolingual representations, and the model learnsinteractions between those representations to maximize a relevance objective(DRMM[59], KNRM[188], or PACRR[66]). The work of Yu and Allan [195]extends these approaches to CLIR by building these interaction-based neu-ral matching models. These matching models are initialized with the fastTextembeddings aligned using a dictionary [70] which is type of projection-basedcross-lingual word embeddings described in Section 3.2. Subsequent researchin monolingual retrieval has focused on switching from context-independentWord2Vec embeddings to the context-sensitive BERT-based representations de-scribed above in Section 3.2 to initialize the neural ranking models. These third-wave context-sensitive Transformer-based architectures have achieved results su-perior to the best previously known techniques for ranking documents with re-spect to a query in monolingual applications [95, 78, 107]. Application of thesetechniques to CLIR have yet to appear, however.

Interaction ranking models that have connections between each query termand every document term can be computationally expensive when ranking allof the documents in a large collection. This leads to the fourth way of employ-ing embeddings in CLIR, cascade-based approaches. These involve running anefficient recall-oriented ranker first to get an initial ranked list, which is then re-ranked using a more computationally expensive model [34, 129, 194]. Zhang et al.[202] extended the cascade re-ranking approach to the cross-language settingand found similar improvements as in monolingual retrieval, especially on low-resource languages. Jiang et al. [68] fine-tuned a pretrained multilingual BERTmodel on cross-language query-sentence pairs constructed from parallel corporain a weakly supervised fashion and applied this model to perform re-rankingfor CLIR. Shi and Lin [167] used a transfer learning approach by applying aretrieval model trained on a large collection in English to retrieve collectionsin other languages. However, the performance of these cascade re-ranking ap-proaches is limited by by the quality of the initial ranked list produced by arecall-oriented ranker.

3.5 Fusion

Different approaches to IR have different strengths and weaknesses. This inher-ent diversity suggests that there is sometimes potential for improving searchresults by combining different approaches. CLIR systems have even greater po-tential for benefiting from diversity than do monolingual IR systems becauseof the additional potential for diversity that translation resources, and ways

Cross-language Information Retrieval March 1, 2021 19

of using those translation resources, introduce. Combining multiple sources ofevidence has a long heritage, and the inclusive name for the general approachis fusion [31, 187, 32]. In IR we distinguish between two classes of fusion ap-proaches, early fusion (in which evidence from multiple sources is combined byone or more components of the full system before results are generated) or latefusion, in which the results of separate IR systems are combined. Late fusion isoften referred to as system combination. In general system combination can beemployed with systems that search different document collections, but here wefocus only on cases in which all systems search the same collection.

The earliest work on system combination for CLIR dates to 1997 when, asnoted above, McCarley [113] found that neither query translation nor documenttranslation was a clear winner. McCarley went on to test a late fusion com-bination between the two approaches, finding that the combination of the twoapproaches yielded the best results. McCarley had endeavored to hold everyother aspect of the system design constant, but later experiments by Braschler[21] and by Kishida and Kando [80] demonstrated benefits from combining morediverse systems as well.

Early fusion of translation resources has also proven to be successful in CLIR.Perhaps the best known case is the fusion of parallel sentences with translationpairs from a bilingual lexicon when training machine translation systems. Thearchitecture for this is simple; machine translation systems are trained on pairsof translation-equivalent sentences, and term pairs from a bilingual lexicon aresimply treated as very short sentences. Because the term distribution in bilin-gual lexicons are not as sharply skewed in favor of common terms as is naturallyoccurring parallel text, this approach can help to overcome the natural weak-ness of parallel text collections (limited size) and reliably learn translations ofrelatively rare terms.

One challenge that arises in both early and late fusion is that the evidenceto be merged may be represented differently. This occurs, for example, whenmerging translation probabilities (e.g., from statistical alignment) with transla-tion preference order (e.g., from a bilingual dictionary) in early fusion, or whenmerging document scores (i.e., “retrieval status values”) from different retrievalsystems in late fusion. In such cases, some form of score normalization is needed.This problem has been extensively studied in monolingual fusion applications,and CLIR experiments, and approaches like CombMNZ [88] with sum-to-onenormalization that work well in monolingual settings also seem to work well inCLIR [122].

Ture et al. [178] conducted early fusion experiments with three ways of es-timating term translation probabilities (one with language model context, onewith term context, and one with no context), finding substantial improvementsover using the best of those three representations alone. Ture and Boschee[177] later extended this approach to include learned query-specific combina-tion weights for a similar set of three translation probabilities (two with lan-guage model context and one with no context), finding further improvements.Shing et al. [168] explored an alternative approach, learning late fusion combi-

20 P. Galuscakova et al.

nation weights without supervised (retrieval effectiveness) training by insteadusing data programming to estimate system weights. Nair et al. [122] comparedearly and late fusion using two approaches (one with language model context,one without), finding similar improvements over the best single system fromboth approaches. Finally, Li et al. [92] experimented with using cross-attentionfor early fusion between CLIR (for product descriptions) and product attributematching in an e-commerce application.

One particularly important consideration when using neural methods is thatneural methods excel at generalization for cases in which there is adequate train-ing data, but they have more difficulty modeling rare phenomena. In informationretrieval, however, queries often tend to skew toward more specific, and thus lesscommon, terms. One commonly used approach in IR with neural methods hasbeen to combine neural and non-neural results, and good results have been re-ported for that approach in CLIR as well, both with early fusion [16] and withlate fusion [161].

3.6 Cross-Language Speech Retrieval

We have so far described retrieval of content (and queries) represented as digi-tal text. But CLIR can be also applied to content that is spoken in a languagedifferent from that of the queries. When doing CLIR on recorded speech, astraightforward approach would be to first perform Automatic Speech Recogni-tion (ASR) and then to treat the resulting text as a text CLIR problem. This isthe approach that was used by all of the participants in the first four shared-taskevaluations on cross-language speech retrieval [43, 44, 186, 133, 137]. This canwork well when ASR accuracy is high, but there are still many settings (e.g.,conversational speech, or speech recorded under difficult acoustic conditions)in which one-best ASR transcripts exhibit substantial word error rates. As thethroughput of commodity computing resources has dramatically increased overthe last two decades it has thus become more common to take advantage of themore complex internal representations of ASR systems.

Internally, ASR systems typically represent what might have been spoken assome form of confusion network (for what might have been spoken between fixedtime points) or lattices (for what might have been spoken between arbitrary timepoints). These representations might be phonetic or word-based. For word-basedconfusion networks that are aligned to the one-best word boundary, equations(1) and (2) in section 2 are easily extended to incorporate an additional multi-plicative factor for probabilities estimated from ASR confidence estimates. AsNair et al. [123] found, this can yield better CLIR results than would be achievedfrom one-best ASR. Yarmohammadi et al. [193] have also reported a beneficialeffect from concatenating multiple recognition hypotheses from utterance-scale(i.e., sentence-like) confusion networks.

Phonetic approaches to cross-language speech retrieval have also been ex-plored. Because translation is typically performed on words, not phonemes, thisapproach is well matched to a query translation CLIR architecture. This is theapproach that was tried in the very first reported work on cross-language speech

Cross-language Information Retrieval March 1, 2021 21

retrieval [166], in which query terms were first translated using a comparablecorpus, then rendered phonetically, and matched with phoneme trigrams for thespoken content that had been created using an ASR system.

In an earlier era, a survey on CLIR might have devoted considerable attentionto the characteristics of specific languages. In recent years, however, CLIR hasbeen evaluated on text and speech collections in a substantial number of languagepairs without yet finding pairs for which the same basic approaches would notwork.

Finally, it is worth noting that there has been a spurt of recent activity onCross-language speech retrieval as part of a large multi-site research program inthe United States known as MATERIAL [197]. Two things distinguish MATE-RIAL from much of the prior work. First, the program is experimenting withnine less commonly studied languages (Bulgarian, Farsi, Kazakh, Lithuanian,Pashto, Somali, Swahili, Tagalog, and one language that is yet to be named).Using results from that program, Rubino [148] has found that differences in re-trieval effectiveness were much better correlated with the amount of availabletraining data for ASR and MT than with phonetic or syntactic similarity of thequery and content languages. Second, the goal of a MATERIAL system is notto rank documents, but rather to select a set of relevant document [72, 128].Set selection is a challenging task, but it is a task that is well matched to theneeds of speech retrieval applications. In text retrieval, the searcher might easilyskim a ranked list of results, and in CLIR they might use MT to help them withthat task. In speech retrieval, however, the linear nature of audio means thatskimming can take longer, and the speech translation systems that would beneeded to support that task for cross-language speech retrieval are still far lessaccurate than are current text translation systems. In some applications, it maytherefore be useful to think of cross-language speech retrieval as a triage appli-cation, selecting documents that will then require time from expert speakers ofthe language. Systems being developed for MATERIAL (e.g., [17, 200, 48]) arethus useful as exemplars for complex cross-language text and speech retrievalsystems that can draw on the full range of techniques described above.

4 Evaluation

As with IR generally, shared task evaluations and test collections have beendriving forces in the development of CLIR, helping researchers to identify andcollaborate on emerging problems, providing the training and evaluation dataneeded to make technical progress, and fostering comparability across researchsites.

4.1 Shared-Task Evaluation

Nie provides an overview of the shared-task evaluations organized through 2010.At that time, only the TREC book had been published [182]; since then, editedvolumes that review two of the other major CLIR evaluation venues (CLEF

22 P. Galuscakova et al.

and NTCIR) have appeared [139, 153]. The Text REtrieval Conference (TREC)was the venue for the first cross-language retrieval tasks. Early work on CLIRwas conducted as a side activity associated with work on non-English collections(in Spanish and Chinese). The main focus of the TREC CLIR tasks between1997 [180] and 1999 [181] was on CLIR in multiple languages. It was there thata key decision that has influenced CLIR test collection design was first sortedout: the topics (from which the queries are generated) should be translated asnatural expressions in the target language so as to more faithfully representwhat a real searcher might pose to a CLIR system. Finally, early in the newcentury TREC returned to bilingual tasks involving Chinese or Arabic [51, 52,132] as the focus on CLIR for European languages moved to the newly formedCross-Language Evaluation Forum (CLEF) in Europe. As its name suggests,a range of CLIR-related tasks were organized at CLEF, including ad-hoc CLIRand domain specific CLIR, interactive CLIR, Cross-Language Spoken DocumentRetrieval and Cross-Language Retrieval in Image Collections. In 2012, CLEF wasopened to broader range of text and multimedia processing tasks and renamedto Conference and Labs of the Evaluation Forum. The CLIR-related tasks after2012 include the eHealth tasks, which also involves multilingual search in medicaldata and Multilingual Cultural Mining and Retrieval (MC2 Lab).

The CLIR part of the MC2 Lab task [29] was focused on microblog search.Short critiques of the movies in French were used as the queries. The main focuswas on finding related microblogs in order to provide a diverse summary of themovie based on the comment and movie information in French, English, Spanish,Portuguese and Arabic. CLIR has been part of the eHealth tasks since 2014 [74].The task was in this year oriented towards the search for non-professional generalpublic users. The collection contained one million of medical documents. Querieswere originally created for the monolingual task in English and were subsequentlytranslated into Czech, French and German. Though the task in the following year[54] was still oriented towards the general public, queries were created to imitateusers searching in the database using observed medical symptoms. Arabic, Farsiand Portuguese were added to the languages of the queries. ClueWeb 12 B13 wasused as the collection in the following years and the queries were mined from webforums to better imitate realistic conditions [76, 55]. The list of languages alsoslightly changed as Hungarian, Polish and Swedish were supported in addition toGerman, French and Czech. In 2018 [23], webpages from CommonCrawl corpuswere used as a corpus and general public queries issued to the HON9 searchservice were used.

CLIR tasks oriented at Japanese and eastern Asian languages were first or-ganized at what is now called the NII Testbeds and Community for Informationaccess Research (NTCIR) evaluations between 2002 and 2007. Related to CLIR,NTCIR provided Cross Language Q&A, Advanced Cross-Lingual InformationRetrieval, Question Answering, Patent Retrieval, and Cross-lingual Link Discov-ery tasks. Since 2010, CLIR was in different forms part of two tasks: GeoTimetask organized between 2010 and 2011, and Crosslink task organized between

9 https://www.hon.ch/en

Cross-language Information Retrieval March 1, 2021 23

2011 and 2013. In the GeoTime task [49, 50], the search was constrained by geo-graphic and temporal constraints using ‘when’ and ‘where’ queries. English andJapanese queries were used to search in the English and Japanese news data.The main focus of the Crosslink task [175, 176], was automatic linking of thecontent to its related content.

The Forum for Information Retrieval (FIRE) was created as a counterpartto the described benchmarks for South Asian languages. Out of the describedbenchmarks, FIRE is the most recent one with the most CLIR tasks organizedafter 2010. These include Ad-hoc Cross-language Document Retrieval organizedbetween 2010 and 2013, SMS-Based FAQ Retrieval organized between 2011 and2012, Cross-Language !ndian News Story Search (CL!NSS) organized in 2013,and Mixed Script Information Retrieval organized between 2015 and 2016. Simi-larly to CLEF, the focus of the FIRE shifted from CLIR oriented tasks organizedat the beginning to broader multilingual access problems such as language iden-tification, event and information extraction, offensive content and hate speechdetection and fake news detection organized in recent years.

Recently, CLIR was also the focus of the OpenCLIR challenge,10 which useda Swahili test collection from the MATERIAL program.

4.2 Test Collections

Shared task evaluations provide test collections and evaluation framework totheir participants, often with an eye toward making those test collections avail-able to future researchers. We focus in this section on test collections whichare presently publicly available; the overview is in Table 1. We list both CLIRtest collections and test collections used in the closely related problems, such ascross-language question answering. In addition to shared task evaluation venues,which were traditionally the leaders in providing test collections, companies suchas Microsoft and Facebook have recently provided access to a range of test col-lections, especially those oriented towards question answering. These collectionsare typically much larger than the collections used for in shared task evaluations,which is particularly important for training data-hungry neural techniques. Incontrast to this, the collections such as the Large-Scale CLIR Dataset11 by Sasakiet al. [158], the CLIRMatrix12 by Sun and Duh [174] and XQA 13 open-domainquestion answering collection by Liu et al. [102] were created from the largecross-language sources of online data, namely Wikipedia. Though these collec-tions can also be useful for training neural models, their synthetic nature makesthem more suitable for pretraining than for task-specific fine tuning. We there-fore omit such collections from our summary. Among question answering testcollections, we focus only on those with language broad coverage, though moretest collections focused on single languages also exist (e.g. MMQA [60]).

10 https://www.nist.gov/itl/iad/mig/openclir-challenge11 http://www.cs.jhu.edu/~kevinduh/a/wikiclir201812 http://www.cs.jhu.edu/~shuosun/clirmatrix13 https://github.com/thunlp/XQA

24

P.Galuscakov

aet

al.

Collection Published #Queries #Documents Languages Availability

The CLEF Test Suite for the CLEF 2000-2003 Campaigns – Evaluation Package [18,

19, 20]14

2001-2003 40/50/50/60 (depends onthe year and language)

44-454k (depends on thelanguage)

English, French, German, Italian,Spanish, Dutch, Swedish, Finnish,Russian, Portuguese

ELRA15

TDT 1-5 [185, 26, 58] 2001-2006 25/96/80/250 topics 16-98k in total (depends onthe year and collection)

Arabic, Chinese, English LDC1617181920

NTCIR-3-6 CLIR: IR/CLIR Test Collec-tion [89, 81, 1, 71]

2002-2007 80/60/50/50 topics (de-pends on the year)

10-900k (depends on theyear and language)

Chinese, English, Japanese, Korean Online application21

CLEF Question Answering Test Suites(2003-2008) – Evaluation Package [108, 109,179, 110, 53, 45]

2003-2008 200 topics per year 55k-454k (depends on thelanguage)

Bulgarian, Dutch, English, Finnish,French, German, Italian, Portuguese,Romanian, Spanish

ELRA22

NTCIR CLQA (Cross Language Q&A dataTest Collection) [159, 160]

2004-2007 200 topics per year 10-900k (depends on theyear and language)

Chinese, English, Japanese Online application23

CLEF AdHoc-News Test Suites (2004-2008)– Evaluation Package [138, 130, 37, 38, 2]

2004-2008 50 topics per year 17-454k (depends on thelanguage)

14 languages ELRA24

CLEF Domain Specific Test Suites (2004-2008) – Evaluation Package [82, 83, 172, 142,141]

2004-2008 25 topics per year 20-303k (depends on thelanguage)

English, German, Russian ELRA25

GALE Phase 1-4 [173, 9, 10] 2005-2009 depends on the collection depends on the collection Arabic, Chinese, English LDC26

NTCIR ACLIA (Advanced Cross-Lingual In-formation Retrieval and Question Answer-ing) [152, 120]

2008-2010 100 topics per languageeach year

249k-1.6M (depends on theyear and language)

Chinese, English, Japanese Online application27

FIRE Information-Retrieval Text ResearchCollection [121, 112, 111]

2008-2012 50 topics per language eachyear

95-500k (depends on theyear and language)

Assamese, Bengali, English, Gu-jarati, Hindi, Odia, Marathi, Pun-jabi, Tamil, Telugu

Online application28

NTCIR Crosslink (Cross-lingual Link Dis-covery) [175, 176]

2010-2013 28 / 25 topics 202k-3.6M (depends on theyear and language)

Chinese, English, Japanese, Korean Online application29

CLEF eHealth 2014-2018 [74, 54, 76, 55, 23] 2014-2018 50/50/67/300/50 topics(depends on the years)

23k / 1M / 5.5 M (dependson the year)

Arabic, Czech, English, Farsi,French, German, Hungarian, Polish,Swedish

Differs for the collec-tions30

Czech Malach Cross-lingual Speech Re-trieval Test Collection

2017 118 topics 353 interviews (592 hours) Czech, English, French, German,Spanish

CC-BY-NC-ND 4.031

Extended CLEF eHealth 2013-2015 IR TestCollection [155]

2019 50/50/166 (depends on theyear) queries

1.1M Czech, English, French, German,Hungarian, Polish, Spanish, Swedish

CC BY-NC 4.032

MLQA [90] 2020 around 5k Q/A pairs perlanguage

2-6k articles per language English, Arabic, German, Spanish,Hindi, Vietnamese, Chinese

CC-BY-SA 3.0 33

MKQA: Multilingual Knowledge Questions& Answers [105]

2020 10k Q/A pair per language - 26 languages CC BY-SA34

XQuAD [7] 2020 1190 Q/A pairs 240 paragraphs 11 languages CC-BY-SA 4.0 35

XTREME [63] 2020 differs for the task differs for the task 40 languages Apache License v2.036

XORQA [8] 2020 2-3 Q/A pairs per language - Arabic, Bengali, Finnish,Japanese,Korean, Russian, Telugu

CC BY-SA 4.037

Table 1. Overview of the CLIR collections. If the collection contains more than 10 languages, we just list the number of languages.

Cross-language Information Retrieval March 1, 2021 25

One important caveat when using these collections is that older test col-lections may not include judgments for relevant documents that are found bynewer systems [96]. This could produce the paradoxical result that better sys-tems might get worse scores. Moreover, CLIR systems are subject to advancesnot just in the core IR technology, but also in the translation technology thatthey use, and (if for cross-language speech retrieval) in the ASR technology thatthey use. A number of evaluation measures that are robust to random omissionsof relevance judgments are known (e.g., infAP and bpref), but the problem withnew methods tested on older collections is that the omissions may well not berandom. Users of older test collections would thus be wise to check the numberof highly ranked unjudged documents in their result sets, and in the result setsfor the baselines to which they compare, and to caveat their results if substantialdifferences in the prevalence of unjudged documents is observed.

Finally, we note that evaluation and design issues in IR broadly arise inCLIR as well. Most notably, comparison to weak baselines, as noted generallyby Armstrong et al. [5], seems particularly problematic in CLIR. Hopefully, thissurvey can go some distance toward minimizing that problem going forward.Moreover, test collections on which CLIR results are reported are often smallerthan those for IR more generally, which also means that greater attention toissues of experiment design and reporting [46, 151] continue to be particularlyimportant. Finally, the development of standard CLIR systems that are widely

14 Apart from the data, the CLEF Initiative topics, experiments, pools and performancemeasures from 2000 to 2013 are also available at http://direct.dei.unipd.it/.

15 https://catalogue.elra.info/en-us/repository/browse/ELRA-E0008/16 https://catalog.ldc.upenn.edu/LDC98T2517 https://catalog.ldc.upenn.edu/LDC2001T5718 https://catalog.ldc.upenn.edu/LDC2001T5819 https://catalog.ldc.upenn.edu/LDC2005T1620 https://catalog.ldc.upenn.edu/LDC2006T1921 https://www.nii.ac.jp/dsc/idr/en/ntcir/ntcir.html22 http://catalog.elra.info/en-us/repository/browse/ELRA-E0038/23 https://www.nii.ac.jp/dsc/idr/en/ntcir/ntcir.html24 https://catalogue.elra.info/en-us/repository/browse/ELRA-E0036/25 https://catalogue.elra.info/en-us/repository/browse/ELRA-E0037/26 https://www.ldc.upenn.edu/collaborations/past-projects/gale/data/gale-pubs27 https://www.nii.ac.jp/dsc/idr/en/ntcir/ntcir.html28 http://fire.irsi.res.in/fire/static/data29 https://www.nii.ac.jp/dsc/idr/en/ntcir/ntcir.html30 https://sites.google.com/site/clefehealth/datasets31 https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-191232 https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-292533 https://github.com/facebookresearch/mlqa34 https://github.com/apple/ml-mkqa35 https://github.com/deepmind/xquad36 https://github.com/google-research/xtreme37 https://nlp.cs.washington.edu/xorqa

26 P. Galuscakova et al.

available lags well behind the situation for IR more generally [39, 190], thusposing additional challenges for reproducability.

5 Future Directions

As the development of CLIR techniques is closely tied with the development inIR and natural language processing, it can be expected that trends in these areaswill be reflected in CLIR. Especially as the neural approaches have only recentlyrisen to prominence in IR, the further growth of these techniques in CLIR canbe also expected. With dominance of neural machine translation techniques inrecent years, it can be expected that closer integration of machine translationand information retrieval techniques, such as joint training and tuning, will befurther explored. Apart from deep neural networks, other machine learning con-cepts, such as reinforcement learning, imitation learning, bandit algorithms ormetalearning may also find their way into CLIR.

The problem of interaction of IR and the user also exists in CLIR. There hasbeen far less reported research on interaction design for CLIR than on mono-lingual retrieval, and the work that has been done (e.g., [135, 143, 201, 97])points to a number of open research questions. Interactive CLIR systems could,for example, use information about preferred languages of the original docu-ments, in which the user might put more weight on different languages, basedon the topic of interest. Subsequently, the presentation of the document mightalso differ based on language of its origin and user’s preferences. Another relatedproblem will be monetization of such interactive search engines. If the search en-gine chooses to present advertisements to the user, they must be in a languageunderstandable by the user. Moreover, the advertisement should be connectedwith the topic of the search and search results, and they need to be for productsor services that can actually be provided to that searcher.

One natural direction for CLIR research is continuation of its integrationwith multimedia retrieval. With the rise of voice assistants (e.g., 39.4% of Inter-net users in US used a voice assistant at least once a month in 2019)38 and thusthe increasing popularity of conversational search, cross-language conversationalsearch may become another emerging topic. As language support by voice assis-tants is still somewhat limited,39 one may expect that CLIR in conversationalsearch might become particularly important for languages with fewer speakers.In such cases, the relevant content might only exist in a language different fromthe language of the speaker, and the voice assistant will need to decide whetherand how to present answers from such content to the speaker. Cross-languageconversational search will thus have new challenges such as determining theprobability that the top relevant document really contains the answer to theuser’s question, determining whether the answer differs in different languages,or exploring how to present the answer to the user if it only exists in a languagedifferent from the query language.

38 https://mobidev.biz/blog/future-ai-machine-learning-trends-to-impact-business39 https://www.globalme.net/blog/language-support-voice-assistants-compared

Cross-language Information Retrieval March 1, 2021 27

Language-based search in speech and video is still an under-explored area,given the vast quantities of content that might be found in such sources. Mono-lingual search in podcasts recently received attention thanks to increasing comm-mercial interest in that content40, and podcasts offer a natural source of contentfor cross-language speech retrieval. CLIR for spoken content is today seen as astraightforward extension to CLIR, but of course speech is richer in some waysthan text (e.g., with evidence for mood, emotions, or speaker accent). It seemsplausible that at least some of this potential will one day be explored in a CLIRcontext.

Image content might be easier to present to speakers of different languagesthen speech or text, as noted by Clough and Sanderson [28]. Indeed, caption-based cross-language image search has been the focus of tasks at both ImageCLEF and in the iCLEF interactive task. Work on this problem has continuedin recent years with CLIR for metadata that describes both images [116, 94]and video [79]. The recent emergence of systems that learn to generate captionsdirectly from image content opens new directions for image retrieval that hasyet to be explored in CLIR applications.

Another emerging topic in Information Retrieval is fairness and bias. Sim-ilarly to neural networks and conversational search, these were identified asemerging IR topics by the SWIRL report [3], and they might reasonably beexpected to ultimately be as important in CLIR research as in monolingual IR.System might be biased towards returning the documents from a certain category(e.g. research papers from universities might be returned more often than theresearch papers from the government organizations). Especially in multilingualCLIR, one might ask if the system is (or should be!) biased towards returningdocuments from particular languages. So, as with many things, CLIR simplyadds an additional dimension of complexity.

As we have sought to illustrate in this survey, CLIR is yesterday’s problem—much of the groundwork has already been laid. It is also today’s problem, sincewithout it the Web provides the potential to access a vast cornucopia of knowl-edge that most of its users simply would not be able to find. And it is tomorrow’sproblem, since wherever IR research goes, CLIR is sure to follow!

Acknowledgements

This work has been supported in part by the Office of the Director of National In-telligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA),via contract FA8650-17-C-9117. The views and conclusions contained herein arethose of the authors and should not be interpreted as necessarily representingthe official policies, either expressed or implied, of ODNI, IARPA, or the U.S.Government. The U.S. Government is authorized to reproduce and distributereprints for governmental purposes notwithstanding any copyright annotationtherein.

40 https://podcastsdataset.byspotify.com

Bibliography

[1] S. Abdou and J. Savoy. Report on CLIR task for the NTCIR-5 evalua-tion campaign. In Proceedings of the Fifth NTCIR Workshop Meeting onEvaluation of Information Access Technologies, Tokyo, Japan, 2005.

[2] E. Agirre, G. M. Di Nunzio, N. Ferro, T. Mandl, and C. Peters. CLEF2008: Ad Hoc track overview. In Evaluating Systems for Multilingual andMultimodal Information Access, pages 15–37, Aarhus, Denmark, 2009.

[3] J. Allan and et al. Research frontiers in information retrieval: Report fromthe third strategic workshop on information retrieval in Lorne (SWIRL2018). SIGIR Forum, 52(1):34–90, Aug. 2018.

[4] W. Ammar, G. Mulcaire, Y. Tsvetkov, G. Lample, C. Dyer, and N. A.Smith. Massively multilingual word embeddings. arXiv:1602.01925, 2016.

[5] T. G. Armstrong, A. Moffat, W. Webber, and J. Zobel. Improvementsthat don’t add up: Ad-hoc retrieval results since 1998. In Proceedings ofCIKM, page 601–610, Hong Kong, China, 2009.

[6] M. Artetxe, G. Labaka, and E. Agirre. A robust self-learning method forfully unsupervised cross-lingual mappings of word embeddings. In Pro-ceedings of the 56th ACL, pages 789–798, Melbourne, Australia, 2018.

[7] M. Artetxe, S. Ruder, and D. Yogatama. On the cross-lingual transfer-ability of monolingual representations. In Proceedings of the 58th AnnualMeeting of the Association for Computational Linguistics, pages 4623–4637, Online, July 2020. ACL.

[8] A. Asai, J. Kasai, J. H. Clark, K. Lee, E. Choi, and H. Hajishirzi. Xor qa:Cross-lingual open-retrieval question answering, 2020.

[9] O. Babko-Malaya. Annotation of nuggets and relevance in GALE distilla-tion evaluation. In Proceedings of the LREC, Marrakech, Morocco, 2008.

[10] O. Babko-Malaya, D. Hunter, C. Fournelle, and J. White. Evaluation ofdocument citations in phase 2 GALE distillation. In Proceedings of theSeventh LREC, Valletta, Malta, 2010.

[11] D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation byjointly learning to align and translate. arXiv preprint arXiv:1409.0473,2014.

[12] L. Ballesteros and M. Sanderson. Addressing the lack of direct translationresources for cross-language retrieval. In Proceedings of the 2003 ACMCIKM, pages 147–152, New Orleans, Louisiana, USA, 2003.

[13] P. Bhattacharya, P. Goyal, and S. Sarkar. Query translation for cross-language information retrieval using multilingual word clusters. In Pro-ceedings of the Workshop on South Southeast Asian Natural Language Pro-cessing (WSSANLP), pages 152–162, Osaka, Japan, 2016.

[14] P. Bhattacharya, P. Goyal, and S. Sarkar. Using word embeddings forquery translation for Hindi to English cross language information retrieval.Computacion y Sistemas, 20(3):435–447, 2016.

Cross-language Information Retrieval March 1, 2021 29

[15] T. Bi, L. Yao, B. Yang, H. Zhang, W. Luo, and B. Chen. Constrainttranslation candidates: A bridge between neural query translation andcross-lingual information retrieval. arXiv preprint arXiv:2010.13658, 2020.

[16] H. Bonab, S. M. Sarwar, and J. Allan. Training effective neural CLIR bybridging the translation gap. In Proceedings of SIGIR, page 9–18, Xi’an,China, 2020.

[17] E. Boschee, J. Barry, J. Billa, M. Freedman, T. Gowda, C. Lignos,C. Palen-Michel, M. Pust, B. K. Khonglah, S. Madikeri, J. May, andS. Miller. SARAL: A low-resource cross-lingual domain-focused informa-tion retrieval system for effective rapid document triage. In Proceedingsof ACL: System Demonstrations, pages 19–24, Florence, Italy, July 2019.Association for Computational Linguistics.

[18] M. Braschler. CLEF 2000 — overview of results. In Cross-Language In-formation Retrieval and Evaluation, pages 89–101, Darmstadt, Germany,2001.

[19] M. Braschler. CLEF 2001— overview of results. In C. Peters, M. Braschler,J. Gonzalo, and M. Kluck, editors, Evaluation of Cross-Language Infor-mation Retrieval Systems, pages 9–26, Berlin, Heidelberg, 2002. SpringerBerlin Heidelberg. ISBN 978-3-540-45691-9.

[20] M. Braschler. CLEF 2002— overview of results. In C. Peters, M. Braschler,J. Gonzalo, and M. Kluck, editors, Advances in Cross-Language Informa-tion Retrieval, pages 9–27, 2003. ISBN 978-3-540-45237-9.

[21] M. Braschler. Combination approaches for multilingual text retrieval. In-formation Retrieval, 7:183–204, 01 2004.

[22] C. Buckley, M. Mitra, J. Walz, and C. Cardie. Using clustering and su-perconcepts within SMART: TREC 6. Information Processing & Manage-ment, 36(1):109–131, 2000.

[23] L. Cappellato, N. Ferro, J. Nie, and L. Soulier, editors. Working Notesof CLEF 2018 - Conference and Labs of the Evaluation Forum, Avignon,France, September 10-14, 2018, volume 2125 of CEUR Workshop Proceed-ings, 2018. CEUR-WS.org.

[24] N. Chen, R. E. Banchs, M. Zhang, X. Duan, and H. Li. Report of NEWS2018 named entity transliteration shared task. In Proceedings of the Sev-enth Named Entities Workshop, pages 55–73, Melbourne, Australia, July2018. Association for Computational Linguistics.

[25] X. Chen and C. Cardie. Unsupervised multilingual word embeddings.In Proceedings of the 2018 Conference on Empirical Methods in NaturalLanguage Processing, Brussels, Belgium, Oct.-Nov. 2018. ACL.

[26] C. Cieri, D. Graff, M. Liberman, N. Martey, and S. Strassel. The tdt-2 textand speech corpus. Proceedings of DARPA Broadcast News Workshop, 082000.

[27] K. Clark, M.-T. Luong, Q. V. Le, and C. D. Manning. Electra: Pre-training text encoders as discriminators rather than generators. arXivpreprint arXiv:2003.10555, 2020.

[28] P. Clough and M. Sanderson. User experiments with the eurovision cross-language image retrieval system. Journal of the American Society of In-formation Science and Technology, 57(5):697 –708, March 2006. © 2006

30 P. Galuscakova et al.

Wiley Periodicals, Inc. This is an author produced version of the publishedpaper. Uploaded in accordance with the journal’s self-archiving policy.

[29] J. Cossu, J. Gonzalo, M. Hajjem, O. Hamon, C. Latiri, and E. San-Juan. CLEF MC2 2018 lab technical overview of cross language microblogsearch and argumentative mining. In L. Cappellato, N. Ferro, J. Nie, andL. Soulier, editors, Working Notes of CLEF 2018 - Conference and Labs ofthe Evaluation Forum, Avignon, France, September 10-14, 2018, volume2125 of CEUR Workshop Proceedings. CEUR-WS.org, 2018.

[30] M. Coury, E. Salesky, and J. Drexler. Finding relevant datain asea oflanguages. Technical report, MIT Lincoln Laboratory, 2016.

[31] W. B. Croft. Combining approaches to information retrieval. In W. B.Croft, editor, Advances in Information Retrieval: Recent Research from theCenter for Intelligent Information Retrieval, pages 1–36. Springer, Boston,MA, 2002. ISBN 978-0-306-47019-6.

[32] J. S. Culpepper and O. Kurland. Fusion in information retrieval. SIGIRTutorial, 2018.

[33] G. Da San Martino, S. Romeo, A. Barroon-Cedeno, S. Joty, L. Maarquez,A. Moschitti, and P. Nakov. Cross-language question re-ranking. In Pro-ceedings of the 40th International ACM SIGIR Conference on Researchand Development in Information Retrieval, SIGIR ’17, page 1145–1148,New York, NY, USA, 2017. Association for Computing Machinery. ISBN9781450350228.

[34] Z. Dai and J. Callan. Deeper text understanding for ir with contextualneural language modeling. In Proceedings of the 42nd International ACMSIGIR Conference on Research and Development in Information Retrieval,pages 985–988, 2019.

[35] K. Darwish and D. W. Oard. Probabilistic structured query methods. InC. L. A. Clarke, G. V. Cormack, J. Callan, D. Hawking, and A. F. Smeaton,editors, SIGIR 2003: Proceedings of the 26th Annual International ACMSIGIR Conference on Research and Development in Information Retrieval,July 28 - August 1, 2003, Toronto, Canada, pages 338–344. ACM, 2003.

[36] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training ofdeep bidirectional transformers for language understanding. arXiv preprintarXiv:1810.04805, 2018.

[37] G. M. Di Nunzio, N. Ferro, T. Mandl, and C. Peters. CLEF 2006: Ad hoctrack overview. In C. Peters, P. Clough, F. C. Gey, J. Karlgren, B. Magnini,D. W. Oard, M. de Rijke, and M. Stempfhuber, editors, Evaluation ofMultilingual and Multi-modal Information Retrieval, pages 21–34, Berlin,Heidelberg, 2007. Springer Berlin Heidelberg. ISBN 978-3-540-74999-8.

[38] G. M. Di Nunzio, N. Ferro, T. Mandl, and C. Peters. CLEF 2007: Ad hoctrack overview. In C. Peters, V. Jijkoun, T. Mandl, H. Muller, D. W. Oard,A. Penas, V. Petras, and D. Santos, editors, Advances in Multilingual andMultimodal Information Retrieval, pages 13–32, Berlin, Heidelberg, 2008.Springer Berlin Heidelberg. ISBN 978-3-540-85760-0.

[39] F. Diaz. indri, 2018. URL https://github.com/diazf/indri.

Cross-language Information Retrieval March 1, 2021 31

[40] S. Dwivedi and G. Chandra. A survey on cross language informationretrieval. International Journal on Cybernetics & Informatics, 5:127–142,02 2016.

[41] M. Faruqui and C. Dyer. Improving vector space word representationsusing multilingual correlation. In Proceedings of the 14th Conference ofthe European Chapter of the Association for Computational Linguistics,pages 462–471, 2014.

[42] M. Federico and N. Bertoldi. Statistical cross-language information re-trieval using n-best query translations. In K. Jarvelin, M. Beaulieu, R. A.Baeza-Yates, and S. Myaeng, editors, SIGIR 2002: Proceedings of the 25thAnnual International ACM SIGIR Conference on Research and Develop-ment in Information Retrieval, August 11-15, 2002, Tampere, Finland,pages 167–174. ACM, 2002.

[43] M. Federico and G. Jones. The CLEF 2003 cross-language spoken docu-ment retrieval track. In CLEF, volume 3237, page 646, 08 2003.

[44] M. Federico, N. Bertoldi, G.-A. Levow, and G. J. F. Jones. CLEF 2004cross-language spoken document retrieval track. In C. Peters, P. Clough,J. Gonzalo, G. J. F. Jones, M. Kluck, and B. Magnini, editors, MultilingualInformation Access for Text, Speech and Images, pages 816–820, Berlin,Heidelberg, 2005. Springer Berlin Heidelberg.

[45] P. Forner, A. Penas, E. Agirre, I. Alegria, C. Forascu, N. Moreau, P. Osen-ova, P. Prokopidis, P. Rocha, B. Sacaleanu, R. Sutcliffe, and E. Sang.Overview of the CLEF 2008 multilingual question answering track. InCLEF, pages 262–295, 01 2008.

[46] N. Fuhr. Some common mistakes in IR evaluation, and how they can beavoided. SIGIR Forum, 51(3):32–41, 2017.

[47] G. W. Furnas, S. C. Deerwester, S. T. Dumais, T. K. Landauer, R. A.Harshman, L. A. Streeter, and K. E. Lochbaum. Information retrievalusing a singular value decomposition model of latent semantic structure.In Y. Chiaramella, editor, SIGIR’88, Proceedings of the 11th Annual In-ternational ACM SIGIR Conference on Research and Development in In-formation Retrieval, Grenoble, France, June 13-15, 1988, pages 465–480.ACM, 1988.

[48] P. Galuscakova, D. W. Oard, J. Barrow, S. Nair, H.-C. Shing, E. Zotkina,R. Eskander, and R. Zhang. MATERIALizing Cross-Language InformationRetrieval: A Snapshot. In Proceedings of the Cross-Language Search andSummarization of Text and Speech Workshop, pages 14–21, 2020.

[49] F. Gey, R. Larson, N. Kando, J. Machado, and T. Sakai. NTCIR-GeoTimeoverview: Evaluating geographic and temporal search. In NTCIR, 07 2010.

[50] F. Gey, R. Larson, J. Machado, and M. Yoshio. NTCIR9-GeoTimeoverview-evaluating geographic and temporal search: Round 2. Proceedingsof NTCIR-9 Workshop, pages 9–17, 01 2011.

[51] F. C. Gey and A. Chen. TREC-9 cross-language information retrieval(english-chinese) overview. In E. M. Voorhees and D. K. Harman, edi-tors, Proceedings of The Ninth Text REtrieval Conference, TREC 2000,Gaithersburg, Maryland, USA, November 13-16, 2000, volume 500-249 of

32 P. Galuscakova et al.

NIST Special Publication. National Institute of Standards and Technology(NIST), 2000.

[52] F. C. Gey and D. W. Oard. The TREC-2001 cross-language informationretrieval track: Searching Arabic using English, French or Arabic queries.In E. M. Voorhees and D. K. Harman, editors, Proceedings of The TenthText REtrieval Conference, TREC 2001, Gaithersburg, Maryland, USA,November 13-16, 2001, volume 500-250 of NIST Special Publication. Na-tional Institute of Standards and Technology (NIST), 2001.

[53] D. Giampiccolo, P. Forner, J. Herrera, A. Penas, C. Ayache, C. Forascu,V. Jijkoun, P. Osenova, P. Rocha, B. Sacaleanu, and R. Sutcliffe. Overviewof the CLEF 2007 multilingual question answering track. In C. Peters,V. Jijkoun, T. Mandl, H. Muller, D. W. Oard, A. Penas, V. Petras, andD. Santos, editors, Advances in Multilingual and Multimodal InformationRetrieval, pages 200–236, Berlin, Heidelberg, 2008. Springer Berlin Heidel-berg. ISBN 978-3-540-85760-0.

[54] L. Goeuriot, L. Kelly, H. Suominen, L. Hanlen, A. Neveol, C. Grouin,J. Palotti, and G. Zuccon. Overview of the CLEF ehealth evaluation lab2015. In J. Mothe, J. Savoy, J. Kamps, K. Pinel-Sauvagnat, G. Jones,E. San Juan, L. Capellato, and N. Ferro, editors, Experimental IR MeetsMultilinguality, Multimodality, and Interaction, pages 429–443, Cham,2015. Springer International Publishing. ISBN 978-3-319-24027-5.

[55] L. Goeuriot, L. Kelly, H. Suominen, A. Neveol, A. Robert, E. Kanoulas,R. Spijker, J. Palotti, and G. Zuccon. CLEF 2017 ehealth evaluation laboverview. In CLEF, pages 291–303, 08 2017. ISBN 978-3-319-65812-4.

[56] J. Gonzalo and D. W. Oard. iCLEF 2004 track overview: Interactive cross-language question answering. In F. Borri, C. Peters, and N. Ferro, editors,Working Notes for CLEF 2004 Workshop co-located with the 8th EuropeanConference on Digital Libraries (ECDL 2004), Bath, UK, September 15-17, 2004, volume 1170 of CEUR Workshop Proceedings. CEUR-WS.org,2004.

[57] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,S. Ozair, A. Courville, and Y. Bengio. Generative adversarial networks.arXiv preprint arXiv:1406.2661, 2014.

[58] D. Graff, C. Cieri, S. Strassel, and N. Martey. The tdt-3 text and speechcorpus. In in Proceedings of DARPA Broadcast News Workshop, pages57–60. Morgan Kaufmann, 1999.

[59] J. Guo, Y. Fan, Q. Ai, and W. B. Croft. A deep relevance matchingmodel for ad-hoc retrieval. In Proceedings of the 25th ACM Internationalon Conference on Information and Knowledge Management, pages 55–64,2016.

[60] D. Gupta, S. Kumari, A. Ekbal, and P. Bhattacharyya. MMQA: AMulti-domain Multi-lingual Question-Answering Framework for Englishand Hindi. In N. C. C. chair), K. Choukri, C. Cieri, T. Declerck,S. Goggi, K. Hasida, H. Isahara, B. Maegaard, J. Mariani, H. Mazo,A. Moreno, J. Odijk, S. Piperidis, and T. Tokunaga, editors, Proceedings ofthe Eleventh International Conference on Language Resources and Eval-

Cross-language Information Retrieval March 1, 2021 33

uation (LREC 2018), Miyazaki, Japan, May 7–12, 2018 2018. EuropeanLanguage Resources Association (ELRA). ISBN 979-10-95546-00-9.

[61] P. Gupta, R. E. Banchs, and P. Rosso. Continuous space models for clir.Information Processing & Management, 53(2):359–370, 2017.

[62] Y. Hoshen and L. Wolf. Non-adversarial unsupervised word translation.In Proceedings of the 2018 Conference on Empirical Methods in NaturalLanguage Processing, pages 469–478, Brussels, Belgium, Oct.-Nov. 2018.Association for Computational Linguistics.

[63] J. Hu, S. Ruder, A. Siddhant, G. Neubig, O. Firat, and M. Johnson.Xtreme: A massively multilingual multi-task benchmark for evaluatingcross-lingual generalization, 2020.

[64] Q. Hu, H.-F. Yu, V. Narayanan, I. Davchev, R. Bhagat, and I. S. Dhillon.Query transformation for multi-lingual product search. In SIGIR cCommWorkshop, 2020.

[65] P.-S. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. Learningdeep structured semantic models for web search using clickthrough data.In Proceedings of the 22nd ACM international conference on Information& Knowledge Management, pages 2333–2338, 2013.

[66] K. Hui, A. Yates, K. Berberich, and G. De Melo. Co-pacrr: A context-awareneural ir model for ad-hoc retrieval. In Proceedings of the eleventh ACMinternational conference on web search and data mining, pages 279–287,2018.

[67] S. H.-C. A. Intelligence. Artificial intelli-gence index annual report 2019, 2019. URLhttps://hai.stanford.edu/sites/default/files/ai_index_2019_report.pdf.

[68] Z. Jiang, A. El-Jaroudi, W. Hartmann, D. Karakos, and L. Zhao. Cross-lingual information retrieval with BERT. In Proceedings of the work-shop on Cross-Language Search and Summarization of Text and Speech(CLSSTS2020), pages 26–31, Marseille, France, May 2020. European Lan-guage Resources Association. ISBN 979-10-95546-55-9.

[69] Jimmy, G. Zuccon, J. Palotti, L. Goeuriot, and L. Kelly. Overview of theCLEF 2018 consumer health search task. In CLEF, 2018.

[70] A. Joulin, P. Bojanowski, T. Mikolov, H. Jegou, and E. Grave. Loss intranslation: Learning bilingual word mapping with a retrieval criterion.In Proceedings of the 2018 Conference on Empirical Methods in NaturalLanguage Processing, pages 2979–2984, Brussels, Belgium, Oct.-Nov. 2018.ACL.

[71] N. Kando. Overview of the sixth NTCIR workshop. In NTCIR, 04 0002.[72] D. Karakos, R. Zbib, W. Hartmann, R. Schwartz, and J. Makhoul. Re-

formulating Information Retrieval from Speech and Text as a DetectionProblem. In Proceedings of the Cross-Language Search and Summarizationof Text and Speech Workshop, pages 38–43, 2020.

[73] K. Kayode and E. Ayetiran. Survey on cross-lingual information retrieval.International Journal of Scientific and Engineering Research, 9, 10 2018.

[74] L. Kelly, L. Goeuriot, H. Suominen, T. Schreck, G. Leroy, D. Mowery,S. Velupillai, W. Chapman, D. Martinez, G. Zuccon, and J. Palotti.

34 P. Galuscakova et al.

Overview of the ShARe/CLEF ehealth evaluation lab 2014. In CLEF,volume 8685, 09 2014.

[75] L. Kelly, L. Goeuriot, H. Suominen, A. Neveol, J. Palotti, and G. Zuc-con. Overview of the CLEF ehealth evaluation lab 2016. In N. Fuhr,P. Quaresma, T. Goncalves, B. Larsen, K. Balog, C. Macdonald, L. Cap-pellato, and N. Ferro, editors, Experimental IR Meets Multilinguality, Mul-timodality, and Interaction, pages 255–266, Cham, 2016. Springer Interna-tional Publishing. ISBN 978-3-319-44564-9.

[76] L. Kelly, L. Goeuriot, H. Suominen, A. Neveol, J. Palotti, and G. Zuccon.Overview of the CLEF ehealth evaluation lab 2016. In CLEF, volume9822, pages 255–266, 09 2016. ISBN 978-3-319-44563-2.

[77] L. Kelly, H. Suominen, L. Goeuriot, M. Neves, E. Kanoulas, D. Li, L. Az-zopardi, R. Spijker, G. Zuccon, H. Scells, and J. Palotti. Overview of theCLEF ehealth evaluation lab 2019. In F. Crestani, M. Braschler, J. Savoy,A. Rauber, H. Muller, D. E. Losada, G. Heinatz Burki, L. Cappellato,and N. Ferro, editors, Experimental IR Meets Multilinguality, Multimodal-ity, and Interaction, pages 322–339, Cham, 2019. Springer InternationalPublishing. ISBN 978-3-030-28577-7.

[78] O. Khattab and M. Zaharia. Colbert: Efficient and effective passage searchvia contextualized late interaction over bert. In Proceedings of the 43rdInternational ACM SIGIR Conference on Research and Development inInformation Retrieval, SIGIR ’20, page 39–48, New York, NY, USA, 2020.Association for Computing Machinery. ISBN 9781450380164.

[79] A. Khwileh, D. Ganguly, and G. J. F. Jones. Utilisation of metadatafields and query expansion in cross-lingual search of user-generated internetvideo. J. Artif. Int. Res., 55(1):249–281, Jan. 2016. ISSN 1076-9757.

[80] K. Kishida and N. Kando. A hybrid approach to query and documenttranslation using a pivot language for cross-language information retrieval.In C. Peters, F. C. Gey, J. Gonzalo, H. Muller, G. J. F. Jones, M. Kluck,B. Magnini, and M. de Rijke, editors, Accessing Multilingual InformationRepositories, pages 93–101, Berlin, Heidelberg, 2006. Springer Berlin Hei-delberg. ISBN 978-3-540-45700-8.

[81] K. Kishida, K. Chen, S. Lee, K. Kuriyama, N. Kando, H. Chen, S. Myaeng,and K. Eguchi. Overview of CLIR task at the fourth NTCIR workshop.In N. Kando and H. Ishikawa, editors, Proceedings of the Fourth NTCIRWorkshop on Research in Information Access Technologies InformationRetrieval, Question Answering and Summarization, NTCIR-4, NationalCenter of Sciences, Tokyo, Japan, June 2-4, 2004. National Institute ofInformatics (NII), 2004.

[82] M. Kluck. The domain-specific track in CLEF 2004: Overview of theresults and remarks on the assessment process. In C. Peters, P. Clough,J. Gonzalo, G. J. F. Jones, M. Kluck, and B. Magnini, editors, MultilingualInformation Access for Text, Speech and Images, pages 260–270, Berlin,Heidelberg, 2005. Springer Berlin Heidelberg. ISBN 978-3-540-32051-7.

[83] M. Kluck and M. Stempfhuber. Domain-specific track CLEF 2005:Overview of results and approaches, remarks on the assessment analysis.

Cross-language Information Retrieval March 1, 2021 35

In C. Peters, F. C. Gey, J. Gonzalo, H. Muller, G. J. F. Jones, M. Kluck,B. Magnini, and M. de Rijke, editors, Accessing Multilingual InformationRepositories, pages 212–221, Berlin, Heidelberg, 2006. Springer Berlin Hei-delberg. ISBN 978-3-540-45700-8.

[84] T. Kudo and J. Richardson. SentencePiece: A simple and language inde-pendent subword tokenizer and detokenizer for neural text processing. InProceedings of the 2018 Conference on Empirical Methods in Natural Lan-guage Processing: System Demonstrations, pages 66–71, Brussels, Belgium,Nov. 2018. ACL.

[85] G. Lample and A. Conneau. Cross-lingual language model pretraining.Advances in Neural Information Processing Systems (NeurIPS), 2019.

[86] G. Lample, A. Conneau, M. Ranzato, L. Denoyer, and H. Jegou. Wordtranslation without parallel data. In 6th International Conference onLearning Representations, ICLR 2018, Vancouver, BC, Canada, April 30- May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018.

[87] T. K. Landauer and M. L. Littman. A statistical method for language-independent representation of the topical content of text segments. In Pro-ceedings of the 11th International Conference: Expert Systems and TheirApplications, 1991, 1991.

[88] J. H. Lee. Analyses of multiple evidence combination. In Proceedings of the20th Annual International ACM SIGIR Conference on Research and De-velopment in Information Retrieval, SIGIR ’97, page 267–276, New York,NY, USA, 1997. Association for Computing Machinery. ISBN 0897918363.

[89] S. Lee, S.-H. Myaeng, H. Kim, J. Seo, B. Lee, and S. Cho. Characteristicsof the korean test collection for clir in NTCIR-3. In NTCIR, 01 2002.

[90] P. Lewis, B. Oguz, R. Rinott, S. Riedel, and H. Schwenk. Mlqa:Evaluating cross-lingual extractive question answering. arXiv preprintarXiv:1910.07475, 2019.

[91] B. Li and P. Cheng. Learning neural representation for clir with adversarialframework. In Proceedings of the 2018 Conference on Empirical Methodsin Natural Language Processing, pages 1861–1870, 2018.

[92] J. Li, C. Liu, J. Wang, L. Bing, H. Li, X. Liu, D. Zhao, and R. Yan. Cross-lingual low-resource set-to-description retrieval for global e-commerce.In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI2020, The Thirty-Second Innovative Applications of Artificial IntelligenceConference, IAAI 2020, The Tenth AAAI Symposium on Educational Ad-vances in Artificial Intelligence, EAAI 2020, New York, NY, USA, Febru-ary 7-12, 2020, pages 8212–8219. AAAI Press, 2020.

[93] Q. Li, S. H. Myaeng, Y. Jin, and B.-Y. Kang. Translation of unknown termsvia web mining for information retrieval. In Asia Information RetrievalSymposium, pages 258–269. Springer, 2006.

[94] X. Li, C. Xu, X. Wang, W. Lan, Z. Jia, G. Yang, and J. Xu. Coco-cn forcross-lingual image tagging, captioning, and retrieval. IEEE Transactionson Multimedia, 21(9):2347–2360, 2019.

[95] J. Lin. The neural hype, justified! a recantation. SIGIR Forum, 53(2):88–93, 2019. ISSN 0163-5840.

36 P. Galuscakova et al.

[96] J. Lin, R. Nogueira, and A. Yates. Pretrained transformers for text rank-ing: BERT and beyond, 2020.

[97] C. Ling, B. Steichen, and A. G. Choulos. A comparative user study ofinteractive multilingual search interfaces. In Proceedings of the 2018 Con-ference on Human Information Interaction & Retrieval, CHIIR ’18, page211–220, New York, NY, USA, 2018. Association for Computing Machin-ery. ISBN 9781450349253.

[98] R. Litschko, G. Glavas, S. P. Ponzetto, and I. Vulic. Unsupervised cross-lingual information retrieval using monolingual data only. In The 41stInternational ACM SIGIR Conference on Research & Development in In-formation Retrieval, pages 1253–1256, 2018.

[99] R. Litschko, G. Glavas, I. Vulic, and L. Dietz. Evaluating resource-leancross-lingual embedding models in unsupervised retrieval. In Proceedingsof the 42nd International ACM SIGIR Conference on Research and De-velopment in Information Retrieval, pages 1109–1112, 2019.

[100] M. L. Littman, S. T. Dumais, and T. K. Landauer. Automatic cross-language information retrieval using latent semantic indexing. In Cross-language information retrieval, pages 51–62. Springer, 1998.

[101] M. L. Littman, F. Jiang, and G. A. Keim. Learning a language-independentrepresentation for terms from a partially aligned corpus. In ICML, pages314–322, 1998.

[102] J. Liu, Y. Lin, Z. Liu, and M. Sun. XQA: A cross-lingual open-domainquestion answering dataset. In Proceedings of ACL, pages 2358–2368, Flo-rence, Italy, July 2019. ACL.

[103] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis,L. Zettlemoyer, and V. Stoyanov. Roberta: A robustly optimized bertpretraining approach. arXiv preprint arXiv:1907.11692, 2019.

[104] E. Loginova, S. Varanasi, and G. Neumann. Towards multilingual neuralquestion answering. In A. Benczur, B. Thalheim, T. Horvath, S. Chiusano,T. Cerquitelli, C. Sidlo, and P. Z. Revesz, editors, New Trends in Databasesand Information Systems, pages 274–285, Cham, 2018. Springer Interna-tional Publishing.

[105] S. Longpre, Y. Lu, and J. Daiber. MKQA: A linguistically diversebenchmark for multilingual open domain question answering. CoRR,abs/2007.15207, 2020.

[106] M. Lupu, A. Fujii, D. W. Oard, M. Iwayama, and N. Kando. Patent-Related Tasks at NTCIR, pages 77–111. Springer Berlin Heidelberg, Berlin,Heidelberg, 2017. ISBN 978-3-662-53817-3.

[107] S. MacAvaney, F. M. Nardini, R. Perego, N. Tonellotto, N. Goharian, andO. Frieder. Efficient document re-ranking for transformers by precomput-ing term representations. In Proceedings of the 43rd International ACMSIGIR Conference on Research and Development in Information Retrieval,SIGIR ’20, page 49–58, New York, NY, USA, 2020. Association for Com-puting Machinery. ISBN 9781450380164.

Cross-language Information Retrieval March 1, 2021 37

[108] B. Magnini, S. Romagnoli, A. Vallin, J. Herrera, A. Penas, V. Peinado,M. Verdejo, and M. Rijke. The multiple language question answering trackat CLEF 2003. In CLEF, volume 3237, 08 2003.

[109] B. Magnini, A. Vallin, C. Ayache, G. Erbach, A. Penas, M. de Rijke,P. Rocha, K. Simov, and R. Sutcliffe. Overview of the CLEF 2004 mul-tilingual question answering track. In C. Peters, P. Clough, J. Gonzalo,G. J. F. Jones, M. Kluck, and B. Magnini, editors, Multilingual Informa-tion Access for Text, Speech and Images, pages 371–391, Berlin, Heidelberg,2005. Springer Berlin Heidelberg. ISBN 978-3-540-32051-7.

[110] B. Magnini, D. Giampiccolo, P. Forner, C. Ayache, V. Jijkoun, P. Osenova,A. Penas, P. Rocha, B. Sacaleanu, and R. Sutcliffe. Overview of the CLEF2006 multilingual question answering track. In C. Peters, P. Clough, F. C.Gey, J. Karlgren, B. Magnini, D. W. Oard, M. de Rijke, and M. Stempfhu-ber, editors, Evaluation of Multilingual and Multi-modal Information Re-trieval, pages 223–256, Berlin, Heidelberg, 2007. Springer Berlin Heidel-berg. ISBN 978-3-540-74999-8.

[111] P. Majumder, F. (Workshop), and A. D. Library. FIRE 2012 & 2013 :Post-proceedings of the 4th and 5th Workshops of the Forum for Informa-tion Retrieval Evaluation : Fourth International Workshop, FIRE 2012,Kolkata, India, December 19-21, 2012 and Fifth International Workshop,FIRE 2013 New Delhi, India. ACM, 2007.

[112] P. Majumder, M. Mitra, P. Bhattacharyya, L. V. Subramaniam, D. Con-tractor, and P. Rosso, editors. Multilingual Information Access in SouthAsian Languages - Second International Workshop, FIRE 2010, Gandhi-nagar, India, February 19-21, 2010 and Third International Workshop,FIRE 2011, Bombay, India, December 2-4, 2011, Revised Selected Papers,volume 7536 of Lecture Notes in Computer Science, 2013. Springer. ISBN978-3-642-40086-5.

[113] J. S. McCarley. Should we translate the documents or the queries in cross-language information retrieval? In R. Dale and K. W. Church, editors,27th Annual Meeting of the Association for Computational Linguistics,University of Maryland, College Park, Maryland, USA, 20-26 June 1999,pages 208–214. ACL, 1999.

[114] P. McNamee. Textual representations for corpus-based bilingual retrieval.PhD thesis, University of Maryland, Baltimore County, 2008.

[115] P. McNamee and J. Mayfield. Comparing cross-language query expansiontechniques by degrading translation resources. In K. Jarvelin, M. Beaulieu,R. A. Baeza-Yates, and S. Myaeng, editors, SIGIR 2002: Proceedings ofthe 25th Annual International ACM SIGIR Conference on Research andDevelopment in Information Retrieval, August 11-15, 2002, Tampere, Fin-land, pages 159–166. ACM, 2002.

[116] E. Menard and V. Girouard. Image retrieval with SINCERITY: A searchengine designed for our multilingual world! OCLC Syst. Serv., 31(4):204–218, 2015.

[117] D. Metzler and W. B. Croft. A markov random field model for termdependencies. In Proceedings of the 28th annual international ACM SIGIR

38 P. Galuscakova et al.

conference on Research and development in information retrieval, pages472–479, 2005.

[118] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation ofword representations in vector space. In Y. Bengio and Y. LeCun, editors,1st International Conference on Learning Representations, ICLR 2013,Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings,2013.

[119] T. Mikolov, Q. V. Le, and I. Sutskever. Exploiting similarities amonglanguages for machine translation. CoRR, abs/1309.4168, 2013.

[120] T. Mitamura, H. Shima, T. Sakai, N. Kando, T. Mori, K. Takeda, C.-Y. Lin, C.-J. Lin, and C.-W. Lee. Overview of the NTCIR-8 ACLIAtasks: Advanced cross-lingual information access. Proceedings of 8th NT-CIR Workshop Meeting, 01 2010.

[121] M. Mitra and P. Majumdar. FIRE: Forum for information retrieval eval-uation. In Proceedings of the 2nd workshop on Cross Lingual InformationAccess (CLIA) Addressing the Information Need of Multilingual Societies,2008.

[122] S. Nair, P. Galuscakova, and D. W. Oard. Combining contextualized andnon-contextualized query translations to improve clir. In Proceedings of the43rd International ACM SIGIR Conference on Research and Developmentin Information Retrieval, SIGIR ’20, page 1581–1584, New York, NY, USA,2020. Association for Computing Machinery. ISBN 9781450380164.

[123] S. Nair, A. Ragni, O. Klejch, P. Galuscakova, and D. Oard. Experimentswith cross-language speech retrieval for lower-resource languages. In AIRS,pages 145–157, 02 2020. ISBN 978-3-030-42834-1.

[124] R. P. Neco and M. L. Forcada. Asynchronous translations with recur-rent neural nets. In Proceedings of International Conference on NeuralNetworks (ICNN’97), volume 4, pages 2535–2540. IEEE, 1997.

[125] J.-Y. Nie. Cross-language information retrieval. Synthesis Lectures onHuman Language Technologies, 3(1):1–125, 2010.

[126] V. Nikoulina and S. Clinchant. Domain adaptation of statistical machinetranslation models with monolingual data for cross lingual informationretrieval. In European Conference on Information Retrieval, pages 768–771. Springer, 2013.

[127] V. Nikoulina, B. Kovachev, N. Lagos, and C. Monz. Adaptation of statis-tical machine translation model for cross-lingual information retrieval ina service context. In Proceedings of the 13th Conference of the EuropeanChapter of the Association for Computational Linguistics, pages 109–119,Avignon, France, Apr. 2012. Association for Computational Linguistics.

[128] NIST. The Official Original Derivation of AQWV, 2017. URLhttps://www.nist.gov/system/files/documents/2017/10/26/aqwv_derivation.pdf.

[129] R. Nogueira and K. Cho. Passage re-ranking with bert. arXiv preprintarXiv:1901.04085, 2019.

[130] G. M. D. Nunzio, N. Ferro, G. J. F. Jones, and C. Peters. CLEF 2005:Ad hoc track overview. In C. Peters and N. Ferro, editors, Working Notesfor CLEF 2005 Workshop co-located with the 9th European Conference on

Cross-language Information Retrieval March 1, 2021 39

Digital Libraries (ECDL 2005), Wien, Austria, September 21-22, 2005,volume 1171 of CEUR Workshop Proceedings. CEUR-WS.org, 2005.

[131] D. W. Oard and A. R. Diekema. Cross-language information retrieval.Annual Review of Information Science and Technology (ARIST), 33:223–56, 1998.

[132] D. W. Oard and F. C. Gey. The TREC 2002 Arabic/English CLIR track. InE. M. Voorhees and L. P. Buckland, editors, Proceedings of The EleventhText REtrieval Conference, TREC 2002, Gaithersburg, Maryland, USA,November 19-22, 2002, volume 500-251 of NIST Special Publication. Na-tional Institute of Standards and Technology (NIST), 2002.

[133] D. W. Oard, J. Wang, G. J. Jones, R. W. White, P. Pecina, D. Soergel,X. Huang, and I. Shafran. Overview of the CLEF-2006 cross-languagespeech retrieval track. In Workshop of the Cross-Language EvaluationForum for European Languages, pages 744–758. Springer, 2006.

[134] W. Ogden, J. Cowie, M. Davis, E. Ludovik, S. Nirenburg, H. Molina-Salgado, and N. Sharples. Keizai: An interactive cross-language text re-trieval system. In Proceeding of the MT SUMMIT VII workshop on ma-chine translation for cross language information retrieval, volume 416,1999.

[135] W. C. Ogden and M. W. Davis. Improving cross-language text retrievalwith human interactions. In 33rd Annual Hawaii International Conferenceon System Sciences (HICSS-33), 4-7 January, 2000, Maui, Hawaii, USA.IEEE Computer Society, 2000.

[136] J. R. M. Palotti, G. Zuccon, Jimmy, P. Pecina, M. Lupu, L. Goeuriot,L. Kelly, and A. Hanbury. CLEF 2017 task overview: The ir task at theehealth evaluation lab - evaluating retrieval methods for consumer healthsearch. In L. Cappellato, N. Ferro, L. Goeuriot, and T. Mandl, editors,CLEF (Working Notes), volume 1866 of CEUR Workshop Proceedings.CEUR-WS.org, 2017.

[137] P. Pecina, P. Hoffmannova, G. Jones, Y. Zhang, and D. Oard. Overviewof the CLEF-2007 cross-language speech retrieval track. In CLEF, pages674–686, 01 2007.

[138] C. Peters. What happened in CLEF 2004? In CLEF, pages 919–919, 082005.

[139] C. Peters, M. Braschler, and P. Clough. Multilingual Information Re-trieval: From Research To Practice. Computer Science. Springer BerlinHeidelberg, 2012.

[140] M. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, andL. Zettlemoyer. Deep contextualized word representations. In Proceedingsof the 2018 Conference of the North American Chapter of the Associationfor Computational Linguistics: Human Language Technologies, Volume 1(Long Papers), pages 2227–2237, New Orleans, Louisiana, June 2018. ACL.

[141] V. Petras and S. Baerisch. The domain-specific track at CLEF 2008. InC. Peters, T. Deselaers, N. Ferro, J. Gonzalo, G. J. F. Jones, M. Ku-rimo, T. Mandl, A. Penas, and V. Petras, editors, Evaluating Systems for

40 P. Galuscakova et al.

Multilingual and Multimodal Information Access, pages 186–198, Berlin,Heidelberg, 2009. Springer Berlin Heidelberg. ISBN 978-3-642-04447-2.

[142] V. Petras, S. Baerisch, and M. Stempfhuber. The domain-specific trackat CLEF 2007. In C. Peters, V. Jijkoun, T. Mandl, H. Muller, D. W.Oard, A. Penas, V. Petras, and D. Santos, editors,Advances in Multilingualand Multimodal Information Retrieval, pages 160–173, Berlin, Heidelberg,2008. Springer Berlin Heidelberg. ISBN 978-3-540-85760-0.

[143] D. Petrelli and E. Not. User-centred design of flexible hypermedia for amobile guide: Reflections on the hyperaudio experience. User Model. UserAdapt. Interact., 16(1):85–86, 2006.

[144] A. Pirkola. The effects of query structure and dictionary setups indictionary-based cross-language information retrieval. In W. B. Croft,A. Moffat, C. J. van Rijsbergen, R. Wilkinson, and J. Zobel, editors, SI-GIR ’98: Proceedings of the 21st Annual International ACM SIGIR Con-ference on Research and Development in Information Retrieval, August24-28 1998, Melbourne, Australia, pages 55–63. ACM, 1998.

[145] R. Rahimi, A. Montazeralghaem, and A. Shakery. An axiomatic ap-proach to corpus-based cross-language information retrieval. Inf. Retr. J.,23(3):191–215, 2020. https://doi.org/10.1007/s10791-020-09372-2. URLhttps://doi.org/10.1007/s10791-020-09372-2.

[146] P. Resnik, D. Oard, and G. Levow. Improved cross-language retrieval usingbackoff translation. In Proceedings of the First International Conferenceon Human Language Technology Research, 2001.

[147] G. Rosemblat, D. Gemoets, A. C. Browne, and T. Tse. Machinetranslation-supported cross-language information retrieval for a consumerhealth resource. AMIA Symposium, pages 564–568, 2003.

[148] C. Rubino. The effect of linguistic parameters in cross-language infor-mation retrieval performance evidence from iarpa’s material program. InProceedings of the Cross-Language Search and Summarization of Text andSpeech Workshop, pages 1–6, 2020.

[149] A. Ruckle, K. Swarnkar, and I. Gurevych. Improved cross-lingual ques-tion retrieval for community question answering. In The World Wide WebConference, WWW ’19, page 3179–3186, New York, NY, USA, 2019. As-sociation for Computing Machinery. ISBN 9781450366748.

[150] S. Ruder, I. Vulic, and A. Søgaard. A survey of cross-lingual word em-bedding models. Journal of Artificial Intelligence Research, 65:569–631,2019.

[151] T. Sakai. On fuhr’s guideline for ir evaluation. SIGIR Forum, 54(1):1–8,2020.

[152] T. Sakai, N. Kando, C.-J. Lin, T. Mitamura, H. Shima, D. Ji, K.-h. Chen,and E. Nyberg. Overview of the NTCIR-7 ACLIA IR4QA task. In NTCIR,01 0001.

[153] T. Sakai, D. W. Oard, and N. Kando. Evaluating information retrievaland access tasks: NTCIR’s legacy of research impact, 2020.

[154] S. Saleh and P. Pecina. Reranking hypotheses of machine-translatedqueries for cross-lingual information retrieval. In International Conference

Cross-language Information Retrieval March 1, 2021 41

of the Cross-Language Evaluation Forum for European (CLEF), volume9822, pages 54–66, 09 2016. ISBN 978-3-319-44563-2.

[155] S. Saleh and P. Pecina. An extended CLEF ehealth test collection forcross-lingual information retrieval in the medical domain. In L. Azzopardi,B. Stein, N. Fuhr, P. Mayr, C. Hauff, and D. Hiemstra, editors, Advances inInformation Retrieval, pages 188–195, Cham, 2019. Springer InternationalPublishing. ISBN 978-3-030-15719-7.

[156] S. Saleh and P. Pecina. Document translation vs. query translation forcross-lingual information retrieval in the medical domain. In Proceedings ofthe 58th Annual Meeting of the Association for Computational Linguistics,pages 6849–6860, Online, July 2020. ACl.

[157] S. M. Sarwar, H. Bonab, and J. Allan. A multi-task architecture onrelevance-based neural query translation. In Proceedings of ACL, pages6339–6344, Florence, Italy, July 2019. Association for Computational Lin-guistics.

[158] S. Sasaki, S. Sun, S. Schamoni, K. Duh, and K. Inui. Cross-lingual learning-to-rank with shared representations. In Proceedings of the 2018 Conferenceof the North American Chapter of the Association for Computational Lin-guistics: Human Language Technologies, Volume 2 (Short Papers), pages458–463, New Orleans, Louisiana, June 2018. Association for Computa-tional Linguistics.

[159] Y. Sasaki, H. hsi Chen, K. hua Chen, and C. jie Lin. Overview of theNTCIR-5 cross-lingual question answering task (clqa1. In In Proceedingsof NTCIR, pages 175–185, 2005.

[160] Y. Sasaki, C.-J. Lin, K.-h. Chen, and H.-H. Chen. Overview of the NTCIR-6 cross-Lingual question answering (CLQA) task. In NTCIR, 04 2007.

[161] S. Schamoni, F. Hieber, A. Sokolov, and S. Riezler. Learning transla-tional and knowledge-based similarities from relevance rankings for cross-language retrieval. In Proceedings of the 52nd Annual Meeting of the As-sociation for Computational Linguistics (Volume 2: Short Papers), pages488–494, Baltimore, Maryland, June 2014. ACL.

[162] P. H. Schonemann. A generalized solution of the orthogonal procrustesproblem. Psychometrika, 31(1):1–10, 1966.

[163] M. Schuster and K. Nakajima. Japanese and korean voice search. In 2012IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP), pages 5149–5152, 2012.

[164] R. Sennrich, B. Haddow, and A. Birch. Neural machine translation of rarewords with subword units. In Proceedings of the 54th Annual Meeting ofthe Association for Computational Linguistics (Volume 1: Long Papers),pages 1715–1725, Berlin, Germany, Aug. 2016. ACL.

[165] P. Sheridan and J. P. Ballerini. Experiments in multilingual informationretrieval using the spider system. In Proceedings of the 19th annual inter-national ACM SIGIR conference on Research and development in infor-mation retrieval, pages 58–65, 1996.

42 P. Galuscakova et al.

[166] P. Sheridan, M. Wechsler, and P. Schauble. Cross-language speech re-trieval: Establishing a baseline performance. SIGIR Forum, 31(SI):99–108,July 1997. ISSN 0163-5840.

[167] P. Shi and J. Lin. Cross-lingual relevance transfer for document retrieval,2019.

[168] H.-C. Shing, J. Barrow, P. Galuscakova, D. W. Oard, and P. Resnik. Unsu-pervised system combination for set-based retrieval with expectation max-imization. In F. Crestani, M. Braschler, J. Savoy, A. Rauber, H. Muller,D. E. Losada, G. Heinatz Burki, L. Cappellato, and N. Ferro, editors, Ex-perimental IR Meets Multilinguality, Multimodality, and Interaction, pages191–197, Cham, 2019. Springer International Publishing. ISBN 978-3-030-28577-7.

[169] S. L. Smith, D. H. P. Turban, S. Hamblin, and N. Y. Hammerla. Offlinebilingual word vectors, orthogonal transformations and the inverted soft-max. In 5th International Conference on Learning Representations, ICLR2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings.OpenReview.net, 2017.

[170] A. Søgaard, I. Vulic, S. Ruder, and M. Faruqui. Cross-lingual word embed-dings. Synthesis Lectures on Human Language Technologies, 12(2):1–132,2019.

[171] A. Sokolov, F. Hieber, and S. Riezler. Learning to translate queries forclir. In Proceedings of the 37th international ACM SIGIR conference onResearch & development in information retrieval, pages 1179–1182, 2014.

[172] M. Stempfhuber and S. Baerisch. The domain-specific track at CLEF2006: Overview of approaches, results and assessment. In C. Peters,P. Clough, F. C. Gey, J. Karlgren, B. Magnini, D. W. Oard, M. de Rijke,and M. Stempfhuber, editors, Evaluation of Multilingual and Multi-modalInformation Retrieval, pages 163–169, Berlin, Heidelberg, 2007. SpringerBerlin Heidelberg. ISBN 978-3-540-74999-8.

[173] S. Strassel, C. Cieri, A. Cole, D. Dipersio, M. Liberman, X. Ma,M. Maamouri, and K. Maeda. Integrated linguistic resources for languageexploitation technologies. In Proceedings of the Fifth International Con-ference on Language Resources and Evaluation (LREC’06), Genoa, Italy,May 2006. European Language Resources Association (ELRA).

[174] S. Sun and K. Duh. CLIRMatrix: A massively large collection of bilin-gual and multilingual datasets for cross-lingual information retrieval. InProceedings of the 2020 Conference on Empirical Methods in Natural Lan-guage Processing (EMNLP), pages 4160–4170, Online, Nov. 2020. ACL.

[175] E. Tang, S. Geva, A. Trotman, Y. Xu, and K. Itakura. Overview ofthe NTCIR-9 crosslink task: Cross-lingual link discovery. In N. Kando,D. Ishikawa, and M. Sugimoto, editors, Proceedings of the 9th NTCIRWorkshop Meeting on Evaluation of Information Access Technologies: In-formation Retrieval, Question Answering and Cross-Lingual InformationAccess, pages 437–463. National Institute of Informatics, Japan, 2011.

Cross-language Information Retrieval March 1, 2021 43

[176] L.-X. Tang, I.-S. Kang, F. Kimura, Y.-H. Lee, A. Trotman, S. Geva, andY. Xu. Overview of the NTCIR-10 cross-lingual link discovery task. InNTCIR, 07 2013.

[177] F. Ture and E. Boschee. Learning to translate: a query-specific combina-tion approach for cross-lingual information retrieval. In Proceedings of the2014 Conference on Empirical Methods in Natural Language Processing(EMNLP), pages 589–599, 2014.

[178] F. Ture, J. Lin, and D. Oard. Combining statistical translation techniquesfor cross-language information retrieval. In Proceedings of COLING 2012,pages 2685–2702, Mumbai, India, Dec. 2012.

[179] A. Vallin, B. Magnini, D. Giampiccolo, L. Aunimo, C. Ayache, P. Osenova,A. Penas, M. de Rijke, B. Sacaleanu, D. Santos, and R. Sutcliffe. Overviewof the CLEF 2005 multilingual question answering track. In C. Peters,F. C. Gey, J. Gonzalo, H. Muller, G. J. F. Jones, M. Kluck, B. Magnini,and M. de Rijke, editors, Accessing Multilingual Information Repositories,pages 307–331, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg. ISBN978-3-540-45700-8.

[180] E. Voorhees and D. Harman. Overview of the sixth text retrieval conference(TREC-6). Information Processing & Management, 36:3–35, 01 2000.

[181] E. M. Voorhees and D. K. Harman, editors. Proceedings of The EighthText REtrieval Conference, TREC 1999, Gaithersburg, Maryland, USA,November 17-19, 1999, volume 500-246 of NIST Special Publication, 1999.National Institute of Standards and Technology (NIST).

[182] E. M. Voorhees, D. K. Harman, et al. TREC: Experiment and evaluationin information retrieval, volume 63. MIT press Cambridge, 2005.

[183] I. Vulic and M.-F. Moens. Monolingual and cross-lingual information re-trieval models based on (bilingual) word embeddings. In Proceedings of the38th international ACM SIGIR conference on research and development ininformation retrieval, pages 363–372, 2015.

[184] I. Vulic, G. Glavas, R. Reichart, and A. Korhonen. Do we really need fullyunsupervised cross-lingual embeddings? In Proceedings of the 2019 Con-ference on Empirical Methods in Natural Language Processing and the 9thInternational Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4398–4409, 2019.

[185] C. L. Wayne. Multilingual topic detection and tracking: Successful researchenabled by corpora and evaluation. In Proceedings of the Second Inter-national Conference on Language Resources and Evaluation (LREC’00),Athens, Greece, May 2000. European Language Resources Association(ELRA).

[186] R. W. White, D. W. Oard, G. J. F. Jones, D. Soergel, and X. Huang.Overview of the CLEF-2005 cross-language speech retrieval track. InC. Peters, F. C. Gey, J. Gonzalo, H. Muller, G. J. F. Jones, M. Kluck,B. Magnini, and M. de Rijke, editors, Accessing Multilingual InformationRepositories, pages 744–759, Berlin, Heidelberg, 2006. Springer Berlin Hei-delberg. ISBN 978-3-540-45700-8.

44 P. Galuscakova et al.

[187] S. Wu. Data Fusion in Information Retrieval. Springer Publishing Com-pany, Incorporated, 2012. ISBN 3642288650.

[188] C. Xiong, Z. Dai, J. Callan, Z. Liu, and R. Power. End-to-end neural ad-hocranking with kernel pooling. In Proceedings of the 40th International ACMSIGIR conference on research and development in information retrieval,pages 55–64, 2017.

[189] J. Xu and R. Weischedel. Cross-lingual information retrieval using hiddenMarkov models. In 2000 Joint SIGDAT Conference on Empirical Methodsin Natural Language Processing and Very Large Corpora, pages 95–103,Hong Kong, China, Oct. 2000. Association for Computational Linguistics.

[190] P. Yang, H. Fang, and J. Lin. Anserini: Reproducible ranking baselinesusing lucene. J. Data and Information Quality, 10(4), Oct. 2018. ISSN1936-1955.

[191] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, and Q. V. Le.Xlnet: Generalized autoregressive pretraining for language understanding.arXiv preprint arXiv:1906.08237, 2019.

[192] L. Yao, B. Yang, H. Zhang, W. Luo, and B. Chen. Exploiting neuralquery translation into cross lingual information retrieval. arXiv preprintarXiv:2010.13659, 2020.

[193] M. Yarmohammadi, X. Ma, S. Hisamoto, M. Rahman, Y. Wang, H. Xu,D. Povey, P. Koehn, and K. Duh. Robust document representations forcross-lingual information retrieval in low-resource settings. In Proceedingsof Machine Translation Summit XVII Volume 1: Research Track, pages 12–20, Dublin, Ireland, Aug. 2019. European Association for Machine Trans-lation.

[194] Z. A. Yilmaz, W. Yang, H. Zhang, and J. Lin. Cross-domain modelingof sentence-level evidence for document retrieval. In Proceedings of the2019 Conference on Empirical Methods in Natural Language Processingand the 9th International Joint Conference on Natural Language Process-ing (EMNLP-IJCNLP), pages 3481–3487, 2019.

[195] P. Yu and J. Allan. A study of neural matching models for cross-lingualir. In Proceedings of the 43rd International ACM SIGIR Conferenceon Research and Development in Information Retrieval, SIGIR ’20, page1637–1640, New York, NY, USA, 2020. ACL. ISBN 9781450380164.

[196] H. Zamani, M. Dehghani, W. B. Croft, E. Learned-Miller, and J. Kamps.From neural re-ranking to neural ranking: Learning a sparse representationfor inverted indexing. In Proceedings of the 27th ACM international con-ference on information and knowledge management, pages 497–506, 2018.

[197] I. Zavorin, A. Bills, C. Corey, M. Morrison, A. Tong, and R. Tong. Corporafor cross-language information retrieval in six less-resourced languages. InProceedings of the Cross-Language Search and Summarization of Text andSpeech Workshop, pages 7–13, 2020.

[198] R. Zbib, L. Zhao, D. G. Karakos, W. Hartmann, J. DeYoung, Z. Huang,Z. Jiang, N. Rivkin, L. Zhang, R. M. Schwartz, and J. Makhoul. Neural-network lexical translation for cross-lingual IR from text and speech. InB. Piwowarski, M. Chevalier, E. Gaussier, Y. Maarek, J. Nie, and F. Sc-

Cross-language Information Retrieval March 1, 2021 45

holer, editors, Proceedings of the 42nd International ACM SIGIR Confer-ence on Research and Development in Information Retrieval, SIGIR 2019,Paris, France, July 21-25, 2019, pages 645–654. ACM, 2019.

[199] L. Zhang and X. Zhao. An overview of cross-language information re-trieval. In X. Sun, J. Wang, and E. Bertino, editors, Artificial Intelligenceand Security, pages 26–37, Cham, 2020. Springer International Publishing.ISBN 978-3-030-57884-8.

[200] L. Zhang, D. Karakos, W. Hartmann, M. Srivastava, L. Tarlin, D. Akodes,S. K. Gouda, N. Bathool, L. Zhao, R. S. Zhuolin Jiang, and J. Makhoul.The 2019 BBN Cross-lingual Information Retrieval System. In Proceed-ings of the Cross-Language Search and Summarization of Text and SpeechWorkshop, pages 44–51, 2020.

[201] P. Zhang, L. Plettenberg, J. L. Klavans, D. W. Oard, and D. Soergel.Task-based interaction with an integrated multilingual, multimedia infor-mation system: a formative evaluation. In E. M. Rasmussen, R. R. Larson,E. G. Toms, and S. Sugimoto, editors, ACM/IEEE Joint Conference onDigital Libraries, JCDL 2007, Vancouver, BC, Canada, June 18-23, 2007,Proceedings, pages 117–126. ACM, 2007.

[202] R. Zhang, C. Westerfield, S. Shim, G. Bingham, A. Fabbri, W. Hu,N. Verma, and D. Radev. Improving low-resource cross-lingual documentretrieval by reranking with deep bilingual representations. In Proceedingsof ACL, pages 3173–3179, Florence, Italy, July 2019.

[203] D. Zhou, M. Truran, T. Brailsford, V. Wade, and H. Ashman. Translationtechniques in cross-language information retrieval. ACM Comput. Surv.,45(1), Dec. 2012. ISSN 0360-0300.