A Survey of Localisation in African Languages, and its Prospects

128
PanAfrican Localisation Project A Survey of Localisation in African Languages, and its Prospects A Background Document Don Osborn, Ph.D. 21 February 2007

Transcript of A Survey of Localisation in African Languages, and its Prospects

PanAfrican Localisation Project

A Survey of Localisationin African Languages,

and its ProspectsA Background Document

Don Osborn, Ph.D.

21 February 2007

This document is produced as part of the PanAfrican Localisation project financed by IDRC.

This work is licensed under the Creative Commons Attribution 2.5 License. To view a copy of this license, visit http://creativecommons.org/licenses/by/2.5/ or send a letter to Creative Commons, 543

Howard Street, 5th Floor, San Francisco, California, 94105, USA.

This document is also available online in wiki format with appendices at: http://www.panafril10n.org/wikidoc/pmwiki.php/PanAfrLoc/Document

ForwardThis document considers the current situation, and the needs and potential of localisation in African languages, including Arabic. Due to the nature of the topic and the range of factors involved, it is extensive and challenging, and yet incomplete. The geographic scope involved is enormous – the second largest continent – and is the home to about a third of the world’s languages. Moreover the entire continent, despite regional variations, is generally disfavoured with regard to information and communication technology (ICT), and to resources for researching, adapting and extending the technology to the mass of the population.

In principle, ICT should be able to meet people in any language and serve as a tool for development in its fundamental and most comprehensive sense – the revealing of potentialities. But in the context where basic needs are often not met, health crises persist, literacy in any language is low, and many tongues do not have a set orthography, discussion of localising ICT in any form may seem like a luxury. It is however an expression of hope, of affirmation of the value and relevance of Africa’s linguistic and intellectual heritage, and a practical calculation that new tools can help find new solutions to old problems – perhaps in the very tongues and idioms most familiar to the disadvantaged.

The document is therefore a beginning with a direction. It consists of a survey document and an extensive set of appendices in wiki form (permitting ongoing online input) on languages, countries, scripts, interAfrican organisations, and localisation resources.

The overall object is to identify issues, concerns, priorities and lines of work as regards localisation in Africa. It also discusses current and potential areas of focus in localising in African languages.

The paper is organised around several thematic sections on language and ICT and then seeks to get specific about actors and activities in the five appendices. In order to help make sense of the processes of localisation, a model of "localisation ecology" is proposed as a way of accounting for various factors that impact current and possible localisation efforts.

Localisation is a popular topic internationally now, but it is neither a fad nor passing fancy. Nevertheless the observations of one writer, Peter Senge, who researched the fad cycle in business management, are worth noting. Senge (2006) observed that new ideas often go through a fairly predictable cycle of interest, during which there is a lot of activity and people getting involved, and then an inevitable slackening of interest, during which the initial enthusiasm wanes and most people move on to other things. And that what makes the difference between something that has an enduring effect or becomes institutionalised in a sustainable way on the one hand, and a passing fad that leaves little long-term effect on the other, is the degree to which the idea is solidly backed up by or linked to theory. In the case of localisation of ICT in African languages, apart some work indicating the importance of first languages as media for communication and learning, there is not much yet to articulate at a theoretical level the importance and utility of using Africa's indigenous languages for all levels of computing and the

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007)

internet. So this document, in addition to reviewing activities and proposing practical measures, also takes a step, hopefully, to defining localisation in the African context and how it is important in the long term.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007)

ContentsForward iii

1. Introduction 1

2. Background 42.1 Importance of African languages in ICT 2.2 What is Localisation? 2.3 Overlapping Regional Contexts: Localisation Where? 2.4 Who Localises? 2.5 What is the Current State of Localisation Across this Region?

3. Introducing "Localisation Ecology" 133.1 An Ecological Perspective of the Environment for Localisation 3.2 The "PLETES" Model 3.3 Dynamic Complexes within Localisation Ecology 3.4 Relevance to Questions of ICT and Localisation 3.5 Localisation Ecology and the "Digital Divide"

4. Linguistic Context 254.1 Languages, Dialects, and Linguistic Geography 4.2 Sociolinguistics and Language Change 4.3 Language and Language in Education Policies 4.4 Basic Literacy, Pluriliteracy, and User Skills 4.5 Terminology and Accommodation of ICT Concepts

5. Technical Context 355.1 Access: Physical and Soft 5.2 Infrastructure 5.3 Computer Hardware and Operating Systems 5.4 Connectivity and Internet Policy 5.5 Trends in Localisation 5.6 Trends in FOSS

6. Africa and the Internationalisation of ICT 426.1 The Facilitating Technical Environment 6.2 Handling Text: Unicode and Complex Script Requirements 6.3 Keyboards and Input Systems 6.4 Languages, ISO-639 and Locales 6.5 Internationalisation and the Web 6.6 Internationalized Domain Names 6.7 Other Applications

7. Current Localisation Activity 567.1 Evolution of African Language Use in ICT 7.2 Fonts 7.3 Keyboards for Africa

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007)

7.4 Locale Data for African Languages 7.5 Software and Operating Systems 7.6 Other applications 7.7 Facilitation of Discussion of Localisation

8. Needs for Sustainable Localisation 748.1 Needs by Kind of Localisation and Localiser 8.2 Understanding the Needs of Localisers 8.3 Analysis of needs from a PanAfrican Perspective

9. Summary and Recommendations 799.1 Major Themes 9.2 Strategic Perspectives 9.3 Conferences and Workshops 9.4 Training and Public Education on Localisation 9.5 Information Resources and Networking 9.6 Languages, Policy and Planning 9.7 Basic Localisation, and ICT Policies and Programs 9.8 Africa and ICT Standards for Localisation 9.9 Advanced Applications and Research

10. Conclusion 96

11. References 98

12. Appendices 10612.1 Major Languages 12.2 Writing Systems 12.3 Countries 12.4 InterAfrican 12.5 Localisation Tools

Notes

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007)

1. Survey of Localisation in African languages: IntroductionWith increasing numbers of computers and penetration of the internet around the world, localisation of the technology and the content it carries into the many languages people speak is becoming an ever more important area for discussion and action. Localisation, simply put, includes translation and cultural adaptation of user interfaces and software applications, as well as creation and translation of internet content in diverse languages. Defined as such, localisation can be understood as essential in: making information and communication technology (ICT) more accessible to the populations of the poorer countries – peoples for whom the technology is supposed to offer new possibilities for advancing development; increasing its relevance to their lives, needs and aspirations; and ultimately to bridging the "digital divide."

This document is part of a project – the PanAfrican Localisation (PAL) Project – that seeks to address localisation in two overlapping regions – Africa and the Arabic-speaking countries. Its main focus will be on sub Saharan Africa and predominately Arabic-speaking North Africa, while acknowledging the fundamental linguistic and cultural connections of the latter with the Arabic Middle East. As such, it is concerned with localisation of ICT in the languages particular to Africa and in Arabic (these will be referred to together as African languages except when there is a reason to treat Arabic separately1).

Africa, which is recognised today as both a continent struggling with aspects of its own development and one where the use of ICT lags behind that of most of the rest of the world, is beginning to see attention to localisation. This is gradual, with projects limited to certain regions, sometimes the result of personal initiatives, but generally without much in the way of organisation, resources, or long-range planning. In addressing this situation, this research paper and the PanAfrican Localisation Project are motivated by the intent to assist the region in making the most of ICT for development (ICT4D) through identifying ways to support effective and sustainable localisation.

The document therefore seeks to explore four sets of questions.

1. Why is localisation important? What are the barriers to greater use of African languages in computing and the internet? How do these affect the potential for localisation?

2. What is actually being done for localisation? By whom, for what languages and in which countries? What are the challenges and solutions that they encounter?

3. What future trends should we look for? What areas to prioritise? 4. How do these relate to each other and how do we address them in localisation work?

1 Aside from being the maternal language for a large population in northern Africa, Arabic is also a major world language with significant speakership outside the continent, so some localisation issues implicate large markets and can draw on significant and diverse resources.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 1

To accomplish this, it is necessary to consider the situations on the ground: from information on languages, speakership, and language and educational policy, to basic information on the current ICT situation, policy, plans, and initiatives, including what is being done already in localisation, where and by whom. Two broad areas – language and technology, as well as their relationship to the social and cultural context – represent the fundamental preoccupations of localisation, but there are other factors that also need to be taken into account.

It is one of the premises of this document that a broader view of apparent and expressed needs can put this information into context and more fully inform programs to assist localisers and ICT4D projects. Understanding the basic information is mainly a matter of drawing on existing research on languages and ICT in Africa. This provides the context for discussion and planning on localisation.

Uncovering what is being done in localisation is more difficult – at least to do systematically – as such activities are often not publicised and out of the view of others (even sometimes in the same country) who might be interested in knowing about them. From there, attention is needed to identifying trends and potentialities – that is, where localisation might be headed and what that means for the future. ICT is a new and unavoidable fact of life in Africa, no less than in other regions, though it is so in ways particular to the needs of each area. Africa is one of the most multilingual regions of the world, so the meeting of technology and language would seem to be of great consequence for development there, even though that fact does not yet get the attention it deserves.

A further task is to determine what sort of information resources and practical skills are needed to assist and facilitate localisation work on the ground in Africa. It is hoped that these findings will contribute to the evolution of the localisation resource website that this project is putting together.

This in turn relates to the visions one may formulate about where technological change is and could be leading, since the evolution of ICT is constant and rapid, and the object of localising and utilising it for development cannot be limited forever to catching up with practice and applications in other world regions.

Beyond that, and returning to basic realities one encounters on the ground in Africa, one cannot separate the tasks and objects of localisation from the larger development and education efforts, policy contexts, and socioeconomic dynamics at play on the continent. This is especially the case as one considers on the one hand, the sustainability of and long term planning for localisation, and on the other, the role of localisation of ICT in addressing larger problems of development.

In order to achieve the goals of this presentation, therefore, one must always keep in mind the main components of localisation mentioned above – language, technology, and their socio-cultural context – as well as the relationships among those and several other factors that affect the possibilities for and the actual implementation of localisation (factors which in turn are affected by the process and achievement of localisation).

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 2

For that set of relationships, this paper will introduce the concept of "localisation ecology" to account for the key factors, facilitate discussion of their interaction, and call attention to how planning and implementing localisation can and should consider these.

The document is organised in several sections. Following some background on the topic, localisation ecology and a model to conceptualise it are presented. The two next sections consider the linguistic and technical contexts of localisation in Africa. The sixth section discusses how internationalisation of ICT relates to Africa and African languages, and the seventh deals with current and potential localisation activity. The penultimate section summarizes needs, followed by the conclusion and recommendations, which includes a discussion of a web-based resource for localisers and other project possibilities.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 3

2. BackgroundThis section introduces the importance of African language use in ICT, defines localisation as used in this paper, discusses the regional context of the research, and outlines how the state of localisation in Africa will be discussed.

2.1 Importance of African languages and ICT

As the information revolution worldwide becomes increasingly multilingual, and as the presence of the new ICTs in Africa extends to larger areas beyond the capital cities, there is an ever greater need to accommodate use of diverse African languages and greater potential to tap the linguistic wealth of the continent for development and education.

It is generally agreed that availability of software and content in the language(s) most familiar to users is an essential element in their adoption and optimal use of computers and the Internet. One might add that in a context where people speak several languages – as one often finds in Africa – the option to use different languages is also empowering.

Accommodation of people’s most familiar languages is moreover a consideration of primary importance for any effort to use ICT for development. This should be of no surprise, as education and communication in the first languages in general is easier for people than in languages they acquire later. Also, on a community or societal level, first languages are considered an indispensable and central aspect of social and cultural systems.2

However, ICT has been introduced to Africa and Arabic-speaking regions in English, French, and in some countries south of the Sahara, Portuguese and Spanish – the same languages of European origin that were used in colonisation of these regions, which have served as official languages since their independence (especially south of the Sahara), and which also serve as what will be referred to here as "European (or Europhone) languages of wider communication" (ELWCs).3 One problem with reliance on ELWCs is that a large majority of people on the continent either do not speak these languages or do not speak them well.4

2 This observation is frequently made. Herbert (1992:1) is among the recent sources.

3 The term "European language of wider communication" (ELWC) was introduced by Eyamba G. Bokamba (1995). "Europhone" is a more recent coinage, sometimes used to refer to European languages and speakers of them in Africa. "Language of wider communication" (LWC) is an established term that refers to any language used vehicularly, generally in contexts where it is a second or additional language. Many African languages including Arabic serve as LWCs or as local linguae francae. ELWCs of course dominate in web content and software worldwide.

4 According to one estimate, up to 90% of the people in some countries do not speak the official languages (Mackey 1989:5, quoted in Robinson 1996:5).

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 4

At the same time the sheer number and diversity of languages on the continent – over 2000 languages according to Ethnologue (Gordon, ed., 2005), which represents about a third of all living languages in the world – poses a challenge for localisation efforts and indeed educational programs that would support them. The fact that many of what are counted as separate languages also fall into clusters of very closely related and interintelligible tongues shows that Africa’s linguistic complexity has many dimensions.

Initiatives aiming to expand the use of ICT in Africa for development, education, or other purposes are beginning to recognise the necessity of responding to these sociolinguistic realities. Such efforts are benefiting also from advances in internationalisation of the technology, from greater use of Unicode (ISO-10646)5 for handling diverse scripts and extended characters, to availability of utilities for creation of keyboard layouts, etc.

However, there are still a number of hurdles. Some are technical, relating for instance to use of extended character sets and Unicode on older computer systems and European keyboards. Some hurdles relate to economic factors such as cost for translation of content. Others are social, relating to education levels, and there are also sometimes negative attitudes towards African languages among foreign development and education experts and even native language speakers.6 Also, there are countries in which government language and education policies disfavour African languages, which in turn has an impact on ICT usage.

In any event, the use of ICT in Africa's indigenous languages should not be seen merely as a compensation for people lacking knowledge of ELWCs, let alone as a second-best or interim solution for such people until the rate of knowledge of ELWCs is greater.7 It is also a question of fairness in access, a long-term practical issue (since it is hard to imagine that Africans any more than the populations of any other region will universally be as comfortable or efficient using

5 ISO is the International Organisation for Standards. In effect, this standards organisation and the Unicode Consortium, begun as an industry association, coordinated their efforts in the mid-nineties to have a single coding system. It is sometimes called the "Universal Character Set" (UCS) but is commonly referred to simply as Unicode. This paper will follow the latter practice.

6 This is a subject that cannot be treated in depth here but merits a brief elaboration. Minimisation of the value of all aspects of indigenous cultures in Africa was a fundamental feature of European and North American interaction with Africa for centuries during which the slave trade and colonisation were rationalised. But while such attitudes are no longer acceptable today, and indeed there is a greater appreciation of African cultures elsewhere in the world today, African languages have had little value attributed to them outside the limited circles of linguistic specialists. As late as the 1970s, a major introductory text on Africa gave little attention to African languages other than to suggest their future was in doubt (Bohannan and Curtin 1971; this statement was modified in later editions – see Bohannan and Curtin 1995). Chaudenson (2004) notes that the subject of language has been almost entirely absent from the discourse on development in Africa. And Brock-Utne (2005) calls attention to the negative attitude of foreign donors towards multilingualism in Africa, who see it as a "hindrance" to development.

7 In education and literacy training in Africa, one strategy has been to use instruction in first languages primarily as a "bridge" to learning in the official language (this is sometimes called a "subtractive bilingual" approach). Localisation of ICT in this report is not conceived with such a limited end in mind, although it is certainly true that people who learn computer use in a more familiar language would be able to acquire computer skills in an additional language more readily.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 5

ELWCs in ICT to the exclusion of their first languages) and a solution that opens up new possibilities for more effective use of the technology by even the most highly educated, thus complementing and expanding upon the potentials offered by use of applications in ELWCs.

2.2 What is localisation?

The term "localisation" is used in various contexts relating to ICT, but the definitions revolve around the adaptation of user interfaces and digital information to the local modes of communication, culture and standards. Daniel Yacob (2004) offers a broad interpretation that defines the object of localisation: "the transfer of cultural consciousness into a computer system, making the computer a natural extension of the society it serves."

"Localisation" is a concern that was arguably inherent or latent in computer technology itself from its very beginning. In other words, it was inevitable that computing would eventually enable the handling of human language and that questions would then arise about choice of languages and that use of additional ones would be raised by users who come from diverse linguistic backgrounds. Then, as computers became able to more readily convey images, sounds, and styles of presentation, issues of cultural appropriateness would naturally follow.

In practice, localisation is both a technical set of approaches and techniques for adapting software and content to particular languages and cultures, and also, more broadly, an enterprise activity that incorporates those technical dimensions, planning, linguistic information, and the organisation necessary to make it happen. Altogether localisation aims at facilitating use of target languages in ICT and can further be understood as an active component of wider efforts to adapt science and technology to diverse societies and cultures.

Localisation as a technical concern

Computer systems, and ICT in general, involve two levels of consideration: hardware and bits (binary encoding). Together these define the technical possibilities for localisation.

At its simplest, the hardware side of ICT can be understood as involving devices and connections. The devices – computers but also increasingly powerful handheld devices – can operate independently for certain purposes including storage and manipulation of data, like text, spreadsheets, and other files. They also can connect to a network that links to other devices – the internet – for retrieval and exchange of information (email, webpages, streaming media). Localisation relates to both of these aspects (independent and networked).

In order for the user to make use of the technology, the bits in which information at the most basic level is encoded and manipulated, and the soft tools for facilitating that are written, are organised in forms that permit interface with the hardware and network, and storage and transmission of information. In other words there are two aspects: interface (accessing and using the technology) and information content (documents, data, etc.)

Table 1 illustrates cross indexes these two levels – the two fundamental categories of hardware and the two fundamental kinds of use to which the technology is put. In effect, by considering

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 6

the two in a matrix it is easier to understand the aspects of ICT that we are concerned with in localisation.

Interface/Access (how we interact with the technology)

Information/Storage (what we use the technology for)

Computer (individual piece of hardware)

Operating system, software for various purposes, keyboard, display

Documents and files of various sorts, created by user(s)

Internet (the network of connections)

(The above plus...) specialised software resident on servers such as search engines, databases

Web content, remote storage

Table 1: Dimensions for localisation

From this analysis we can identify three separate but overlapping concerns. These are listed and then discussed below:

• Equipping systems deployed in various localities – or actualising their existing capacities – to handle local language needs. This facilitates production of documents and also display of multilingual web content.

• Production of web content for diverse audiences in languages and formats that they can understand.

• Localisation of user interfaces on individual devices and the internet

All three of these concerns are a focus of the PAL project, but the localisation of interfaces (particularly software) is pivotal, as it both is the logical extension of efforts to equip systems to handle local language needs and has the potential to facilitate production of localised content.

Equipping systems: This is mainly a matter of actualising the potential of computer systems to handle local languages in various ways, notably non-ASCII text.8 The main issues are fonts, input and display.

Many languages of Africa are written with an extended Latin script and a number of others, like Arabic, use non-Latin scripts. For all of these languages, unlike another group of languages that use basically the same character set as those of Western Europe (including many in Southern and Eastern Africa), the advent of Unicode represents a new era of possibilities. However, basics such as an adequate choice of complete fonts and standardised and user-friendly input (mainly keyboard layouts, but eventually also speech recognition software9) are necessary. The first step of localisation for these languages is in effect this "last mile" of internationalisation (which in

8 The focus here is mainly on the written languages, but it is important to acknowledge the importance of audio and non-text images – whether alone or in combination with text – in localisation and multilingual computing. These include some applications that will be discussed later.

9 We will mention keyboards briefly in this section. A more in-depth treatment, including discussion of speech recognition and speech-to-text is a topic below (section 7.3).

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 7

turn refers to the process of improving computers and systems to be able to accommodate diverse language needs of the world).

Fonts are really the first issue, since without fonts that include the necessary extended characters or non-Latin scripts, software applications will not fully or correctly display many languages. This means Unicode fonts – fonts in which all characters are encoded according to the Unicode standard – since legacy 8-bit fonts, while they may be able to display the characters and diacritics used in whatever language(s) they were designed for, are not readable on systems without those fonts installed. Basically, 8-bit fonts are not intercompatible since each uses the limited number of codepoints for characters in a different way while Unicode in principle provides a single codepoint for each character in every writing system (this is discussed further below, 6.2).

For input of text in languages that use non-ASCII characters, specialised keyboard layouts are also necessary, and these may be created for languages or groups of languages for which there does not yet exist localised software. Beyond capacities to handle text, the capacity of systems to permit users to create and use multimedia that does not rely solely on text is another important, though sometimes overlooked, consideration.

At the same time it is recognised that there do exist many older computers in Africa, often the result of donations of used equipment, whose systems cannot handle Unicode and may be limited in other aspects. (See below, 5.1)

Content: "Content" is usually taken to mean "web content" – the information conveyed on the pages of sites on the World Wide Web (WWW). More broadly, we may take it to include information stored as documents or data on computers or conveyed over the internet by other means, such as e-mail. The latter is of interest in measuring the use of and demand for ability to use diverse languages.

The production and display of information via the web that is relevant and accessible to users is facilitated by the considerations under the previous point, above. Choice of languages is obviously a key consideration, since it is via comprehensible idioms that ICT can convey information for development or other purposes. Additional considerations such as the cultural appropriateness of themes and images, and the approach to communication within the language (dialects, contemporary vs. formal styles, etc.) are also important aspects of localised content.

One can divide localised web content into two parts based on origin: that produced locally and that produced elsewhere but targeted to the local audience. Both are important, but our main concern is the former. Ballantyne (2002) discusses content in terms of this division and also the target of the content. It may be useful to seek to develop collaboration among local and international content developers wherever they may be in terms of how best to address their common intended audience.

At present there is little web content developed in African languages either in Africa or elsewhere, and relatively little content of any sort coming from Africa. The issue of localised content is discussed in more detail below (see 7.1).

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 8

Localisation of user interfaces: This of course includes the translation of basic computer software such as browsers and word processors into different languages, including commands, dictionaries, help files etc. (the capacities of software to handle diverse language needs is considered under the first point above). In addition to translation, other issues such as conventions for display of certain information and the culturally appropriateness of themes used in the software are also important considerations. In many cases, localisation may be some but not all of the above. Localisation in Arabic, and to a very limited degree in some more widely-spoken African languages, is being undertaken by commercial software companies, notably Microsoft. While this helps in the overall picture of localisation, it concerns only major languages and large markets. It may also entail higher costs for users than can be supported, especially in ICT4D contexts. Therefore, in the interests of inclusion of languages and local expression, and of lower cost solutions, free/open-source software (FOSS) is the focus of this project.

Localisation as project

Localisation in its broader sense of a process and enterprise takes into consideration several other matters such as:10

• factors necessary to localise o a standardised orthography o locale data o organisation and resources to accomplish localisation in the stricter technical sense

• aspects of sustainability in the long term o follow-through and marketing of localised software o follow-up with the user community and for updates

• attention to issues of user skills (from basic literacy to computer literacy) • impacts of localisation of ICT on other aspects of society, economy and culture.

All of these are important to take into consideration. A framework to facilitate that is proposed in the following section.

2.3 Overlapping Regional Contexts: Localisation Where?

Africa

Africa is a multilingual continent but there is no software and even internet content in the vast majority of its many languages – even in most of the major and more widely-spoken ones. Every country on the continent has some linguistic diversity, resulting from the history of population movements and the overlay of colonial languages. Most countries, especially south of the Sahara, have no single majority language. A few countries have scores or even hundreds of languages. The ELWCs introduced during colonisation – English, French, and Portuguese – serve as official languages and facilitate communication to one degree or another across wide areas, but are

10 Even more broadly, on a "meta" level, one might also include development of tools to facilitate the process of localisation. This is different than the internationalisation of the technology. Such tools are discussed below, Section 5.4.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 9

primarily second languages of the more educated and urbanised segments of society, and do not have the same connection with African cultures as indigenous African languages.11 Most of the people who have less facility in the ELWCs are in rural areas, and include a higher percentage of women than of men.

Software and content in ELWCs, therefore, cannot satisfy the needs of the majority of the African population, and even in the limited (mainly urban) locations where they might, many people would effectively still be excluded linguistically, and everyone would have their language options restricted by the lack of content and software in African languages. In addition, any effort to use ICT for development purposes would be hindered to the extent the working languages are limited to ELWCs. Put another way, the needs for localisation in Africa correspond to the hopes for ICT to play a full and effective role in development on the continent.

Currently as the numbers of computers and the quality of internet connections on the continent is increasing, and so too is interest in localising software and content in African languages. This is not only for reasons related to development, naturally, but also for the same reasons one sees such interest elsewhere. The amount of material in African languages on the internet is increasing slowly, and there are active efforts to localise software, particularly in South Africa, Eastern Africa, and Nigeria. However there is limited connection among the efforts, and limited knowledge beyond that of a few specialists.

This comes at a time, though, when there is increased interest in localisation around the world and in both commercial software companies on the one hand and the FOSS movement on the other. The time is opportune given the need, the budding local interest, and the international resources potentially available, to facilitate localisation in Africa.

Arabic-speaking world

The need for localisation in Arabic is very real but of a different nature, even though many countries in this region were also colonised, had a similar overlay of English and French languages, and were first introduced to ICT in those tongues. Unlike the case with sub-Saharan Africa and its languages, however, there is by now already a significant amount of localised software and content in Arabic, as one would expect for a major international language. Also, the Arabic-speaking world in general, including the countries of North Africa, has better infrastructures and ICT indicators than Africa south of the Sahara. So the challenges in this region are less daunting than in Sub-Saharan Africa.

Nevertheless, the range of localised software is still arguably limited, and there is not a corresponding level of localisation dealing with local themes and idioms. Building the capacity of developers to localise Arabic software and content for their diverse user communities, particularly those outside of the major cities, is a goal in such cases, and will be the focus of the project’s work on this language. There is also a need to produce Arabic electronic dictionaries for FOSS applications like OpenOffice – these could be localised to countries, much as English or French dictionaries differ among locales.

11 This is not to suggest that ELWCs in Africa have no connection, but that it is different and for obvious reasons less deep.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 10

Within the Arabic-speaking world, this report focuses on the countries of North Africa, while acknowledging the important cultural and historic connections it has with the rest of the Middle East.

2.4 Who Localises?

The question of who benefits, or potentially might benefit, from localisation has already been touched upon above in discussing why localisation is important (2.1). It is also useful to briefly consider who does localisation and would thus more immediately benefit (in terms of information, networking, tools) from this project.

The question has as many dimensions as there are types of localisation. Yet a simple answer to all of that might be that anyone who is motivated to connect African languages with the content and interactive language of ICT, has the means to do it, and actually initiates or participates in some aspect of localisation is a localiser. The profile of a localiser would also include higher than average education, working knowledge of ICT, and knowledge of at least two languages – a dominant languages used in ICT and the one in which to localise. This is a very select group in any case, and all the more so in Africa.

In terms of origin and location of localisers, one might identify there are three broad categories: Africans in Africa; Africans residing in other parts of the world; and non-Africans who have a strong knowledge of (including language) and interest in African localisation. In some parts of the continent that are better off economically, educationally, and in terms of technical infrastructure, such as in the North or in South Africa, the first category is stronger. However, the latter two categories can in some ways reinforce the first. On the other hand, in some contexts the categories of localisers from outside of Africa may initiate or drive localisation efforts – for instance African expatriates developing one or another kind of project in their home languages, commercial interests, or international development organisations.

Another valid characterisation would be to say that content localisers require language skills but less depth in technical skills, while software localisation requires both. The fundamental concern of equipping systems – whether on the level of designing an ICT4D/E project, for instance, or of managing a cybercafé – requires mainly an awareness of internationalisation issues and familiarity with the local language needs.

People who localise also range in skill sets, such that groups of individuals with complementary skills in language and technology make a logical team. This implies some level of organisational skills to coordinate efforts and plan actions. Since localisation implies products destined for a market (whether those products are free or not), marketing is another concern. This means that “who localises” may also involve people who bring primarily skills mentioned above to a collaborative effort. Motivation to work on localisation can thus be considered as the first defining characteristic of people who localise.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 11

2.5 What is the Current State of Localisation across this Region?

One of the purposes of this report is to give a better idea of what is happening with localisation in Africa, both as information for localisers, policymakers, and ICT for development experts, and also as a benchmark of sorts to evaluate the effectiveness of future localisation efforts.

In general one can say that the potential for localisation is great, but despite growing interest in Africa, the current level of activity varies, and is generally small, with some differences between regions in the degree and character of localisation initiatives and related local or multilingual ICT efforts.

Section 7 of this paper will discuss recent and current localisation efforts. The intervening sections will, in addition to exploring the need and potential for localisation, set the context for understanding the current localisation situation.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 12

3. Introducing "Localisation Ecology"3.1 An Ecological Perspective of the Environment for Localisation

Localisation is one of the keys to bridging the digital divide – that is to increasing access to and relevance of ICT – and is at the same time a process affected by various factors including those that define the digital divide and other divides that separate the more and less advantaged parts of the world. Understanding how these factors – individually or in combination – constrain or facilitate localisation initiatives is important both for profiling the actual state of localisation in Africa and for suggesting how to assure the sustainability of localisation projects. This paper proposes "localisation ecology" as a conceptual framework for accounting for these factors, their impact and their interaction, and introduces a model to facilitate its application.

Such an abstract framework, although it does not address the immediate and relatively straightforward tasks of translation of software or content, is nevertheless also intended to have practical utility in planning for longer term localisation projects. In terms of this survey document, the localisation ecology framework is further intended to provide a schema for ongoing evaluations of localisation in Africa. And beyond the framing question of the digital divide, this ecological model will hopefully also have utility in considering the impact of localisation on the so-called analogue divide.

Ecology

"Ecology" – originally study of the interaction of organisms in the natural environment – is a concept that has gradually been applied more broadly, first in various models to describe interactions between human societies and the natural environment (human, social, cultural, landscape and political ecologies),12 and subsequently in more abstract senses to describe multifaceted processes in society and individual life (e.g., family ecology, cognition, and linguistic ecology).

This evolution and broader application of ecology as a conceptual tool is entirely appropriate, given its origins in holistic and systems thinking that emerged in the early 20th century.13 In this document, therefore, we will carry this process further to consider localisation ecology, but in so doing, find that other work provides a ready foundation.

In 1970, Einar Haugen proposed using the metaphor of ecology to understand the dynamics of languages and how these relate to other factors (Fill 2001). This ushered in a period in which the study of language ecology or linguistic ecology was popular. Although there have been fewer publications carrying one of these terms in recent years, it is accepted to analyze the situation of languages in relation to each other and various social factors.

12 Some examples include models proposed by Duncan (1959), Rambo (1983), and Campbell and Olson (1991).

13 One might note that the South African, Jan Smuts, articulated the concept of "holism" in 1926.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 13

The idea of applying the ecology concept to various aspects of use of ICT and/or the digital divide, although obviously more recent, is also not new. For instance, Matwyshyn (2003) discusses the gender gap in use of technology in terms of human or social ecology. Another example is the website of World-Information.org which discusses "digital ecology" in terms of "information ecosystems" that aim at "understanding the production, distribution, storage, accessibility, ownership, selection and use of information in technologically determined environments."14

Our interest in developing a model of localisation ecology is in some ways anticipated by Robert Chaudenson’s (1987, 2003) discussion of "integrated language management"15 in terms quite similar to those of ecology. His particular focus on the factors relating to choice of orthography is still relevant, and can be expanded to other technical and linguistic considerations from the point of view of language planning and management. (See also section 3.2 below.)

The Ecology of Localisation

Haugen’s (2001 [1972]) original definition of language ecology serves as a good starting point for the purposes of this discussion: "the study of interactions between any given language and its environment."

Localisation is undertaken within specific contexts. The general rationales for localisation are discussed above. However, in specific countries or for specific languages, the motivations and hopes of localisers may address particular issues arising from conditions, needs, opportunities, and aspirations in their area. In addition, localisation is undertaken in an environment of socio-cultural, linguistic, policy and legal, educational, technological and economic factors and trends. The framework of localisation ecology ideally provides a way of accounting for factors and trends, as well as their potential interaction.

Localisation ecology, therefore, is first of all a way of understanding:

• the factors affecting the potential for localisation and the effort to make a successful localisation, • how they affect localisation (facilitate, limit, etc.), and • interaction among these factors in ways important to localisation.

It is important, however, to remember that localisation is not only affected by various factors, but is also a process that can introduce new dynamics into other spheres of activity, such as the use of ICT, education, the development of languages, and the evolution of the sociolinguistic situation. So the model – like the real world situation it is intended to reflect – is thoroughly interactive.

When we consider a particular localisation effort, what we usually think of is a group of people or an organisation dealing with a range of specific tasks and needing some level of input of resources and perhaps advice and information to achieve them. In fact, the effort is dependent on

14 See http://world-information.org/wio/readme/992006691 .

15 In French: "aménagement linguistique intégré."

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 14

other parts of what is actually a system. If this fact is not clear when the effort is launched, it is likely to become so later, as obstacles and new phases of localisation are encountered. The immediate tasks of localisation therefore are just part of the story and thus part of a range of concerns that need to be taken into account for successful and sustainable localisation initiatives.

The encoding of the Tifinagh script in Unicode, although not strictly speaking a localisation project, is a good case in point. A proposal to encode the script had not progressed beyond an initial draft for several years. Then, in 2003, when the Moroccan government decided to use Tifinagh in teaching the Tamazight (Berber) language in schools, this highlighted the need for the ability to use the script freely in computing and the internet, something that was hindered by reliance on legacy encodings. In effect, an education policy decision revived the effort to encode Tifinagh, and these two factors along with others may lead to localised content and software in the Tamazight language, which in turn will have other effects.

The interaction of factors might also be illustrated in this way: In cases where a government or a donor announce a project to establish rural telecentres (such as in Ghana recently16) or to supply computers to schools (such as in Rwanda recently17), the availability or not of localised software makes a big difference in the options of the project and what it can provide. Such programs in turn could provide impetus and resources for localisation.

Localising software also depends on some levels of standardisation of orthographies, terminology, and dictionaries, which in turn might benefit or not from government language policies as well as other institutional programs on local languages (e.g., at universities, literacy agencies, or non-governmental organisations such as SIL International), but which also might be catalyzed by localisation initiatives.

Such examples point to another important feature of such an approach to understanding the interaction of factors – scalability. Various decisions and actions relating to ICT and/or localisation in Africa are taken on either more local or broader levels. The interaction for instance of donors and national governments on questions of ICT for development or education, for instance, affect the environment for interactions on more local levels. Conversely, the localisation of software in specific languages – a process that may involve only a few actors on a more local level – can impact discussion of language policy on national and regional levels, and even by international donors.

3.2 The "PLETES" Model

There are a number of specific factors that that may be discussed in imagining a dynamic and scalable model of localisation ecology. In any ecological system one could say that everything is interrelated directly or indirectly, but for purposes of understanding and analysis it is helpful to identify or specify a limited number of key factors and relationships.

16 Ghana News Agency. 2005. "About 230 rural communities to get ICT centres." http://www.ghanaweb.com/GhanaHomePage/NewsArchive/artikel.php?ID=94999 .

17 Nsengiyumva and Stork, "Rwanda" in Gillwald (2005).

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 15

To do so, we may consider other models. Perhaps the closest model is that proposed by Chaudenson (2003) for language management. Even though it was not framed as an "ecology," it does illustrate the interrelationship of various factors in a decision relating to language management. The elements he mentions include the following aspects: linguistic, technical, psycholinguistic (individual reactions), economy (in the sense of economy of usage), and sociolinguistic. This particular example (Figure 1) is adapted from his presentation of its use for choice of an element of writing.

Fig. 1: Model of language management (Chaudenson 2003)

In any such model of interacting factors, per the above discussion, there is always a degree of simplification and a selection of aspects to emphasize in for the particular kind of situation to be described. Chaudenson focused on a relatively specific matter in which four aspects of linguistics are considered separately alongside the technical factor: aspects of the language itself, sociolinguistics, psycholinguistics, and economy of use. Social dimensions – in this case how people interact with an element of orthography – are implied in other factors.

In the case of localisation ecology, we can start by suggesting that the fundamental factors are language, technology, and society or socio-cultural. The latter is, as in the Chaudenson model, sometimes considered as implied, but it merits attention as a separate category. Indeed, although the two factors of language and technology are the immediate focus of accomplishing the translation part of a localisation project, the social and cultural dimension is on the same level of importance when one considers the user dimensions and also the impact of localised technology on society.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 16

Each of these three categories is very broad and includes subdivisions, but as such they are useful for highlighting their importance and interrelationships. It is possible for instance to develop ways of using a language on a computer without consideration of users (other than those working on the project), or to go about developing new systems for users without considering the dimension of languages (other than a dominant one – this we will see below in discussion of "digital divide" projects). The triangle shown in Figure 2 is a simple representation of these three factors and their interconnections.

Fig. 2: Three basic factors

We also know that other factors affect the potential and results of localisation – in effect one must "think outside the triangle." Among these factors, three emerge as especially important: policies and the process that produces them (politics); financing, markets and resource availability (economics); and the schooling and training of people in general skills such as literacy and in use of ICT (education). By adding these three categories, we then have six headings for factors or groups of factors that can be considered as key to localisation:

1. Political. Policies; decisionmaking processes and interplay of interests leading to those. Legal and licensing environment.

2. Linguistic. This includes the linguistic situation of the country/region and aspects of the each language. How many languages are there? What is their distribution and speakership? Is there a standardised orthography for each language? Are there diverse dialects within a language?

3. Economic. Standard of living, resources available for various kinds of business, public, social, and philanthropic investment. Individual/family income levels.

4. Technological. Electricity and communications infrastructures, availability of computers (and types and kinds of operating systems), and internet connectivity. How do these factors differ across the territory of a country?

5. Educational. Systems of education (formal, informal), school infrastructure. 6. Socio-cultural. Demographics, social structure, ethnic groups, culture(s), popular and individual

attitudes.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 17

These six and the connections between and among them are what make the model a useful tool for understanding the environment for localisation. For convenience of reference, we may refer to it as the "PLETES" model (from the first letters of each of the six factors; see Figure 3). It is not intended to be definitive – there may be other ways to illustrate the same issues and/or the relationships among key factors – but it serves at least as a basis for discussion and a way of keeping the key factors in plain view during discussion.

Fig. 3: The "PLETES" Model

A list of combinations of pairs of factors helps to show the coverage of this model. Some of these combinations may not directly affect the process of localisation, but the all of them in one way or another shape the environment for localisation, especially for projects undertaking localisation and their potential for sustainable results:

• Language and society: Sociolinguistics and applied linguistics. Numbers of languages, distribution of speakers, attitudes about languages, how different languages are used in different situations and by different groups, languages in popular culture. Sociolinguistic factors are key in understanding the expressed and latent needs for localised interfaces and content, and people's reception of products of localisation.

• Language and education: Literacy rates – in a multilingual context, by language. Literacy in first (spoken) languages is an important consideration in gauging the potential immediate usership of localised software. Language(s) of instruction in formal education systems, which will have an effect on literacy rates in those languages and indeed their development.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 18

• Language and politics: Language policy and planning, from legislation to implementation. Relates to official attitudes towards localisation, and also to matters such as standardisation of orthographies and government support for African language documentation, periodicals etc. that produce resources that can be used in aspects of localisation.

• Language and technology: Several connections: how the technology supports the languages, including Unicode, keyboard layouts, potential for software localisation, advanced applications; how the languages support the technology – terminology; the translation part of localisation; computational linguistics.

• Language and economics: Resources for language work (documentation, corpus development, terminology), economics of localisation.

• Socio-cultural and education: Rates of school attendance and completion, who is educated, numbers of people with skills in particular areas. A factor in knowing what proportion of the population could actually take advantage of localised software.

• Socio-cultural and political: Who makes the policies? What are the interests? May be important in understanding official attitudes to localisation and other matters such as language policy, and in turn suggests approaches to addressing those as needed.

• Socio-cultural and technology: Who has physical access and rights to use the technology? Attitudes to technology. Impact of technology on culture and society.

• Socio-cultural and economic: Fundamental and generally longstanding socioeconomic issues, including the foundations of the "analogue divide" that often parallels or conditions the "digital divide."

• Education and politics: Educational policy. • Education and technology: Education about technology; technology in education. Efforts to put

computers in schools or give laptops to children are also examples of cases where this is a primary dynamic.

• Education and economics: Investment in education, budgets etc. (e.g., for schools, teacher training, materials development, books for students).

• Technology and politics: ICT policy and planning, including NICI plans. Issues of licensing of software and intellectual property.

• Economics and politics: Economic policy, including development, budget, and donor priorities. • Economics and technology: Economics of ICT, including such issues as relative resources

available for investment in ICTs, attractiveness of outsourcing strategies, marketing of localised software, etc.

In the real world, of course, many factors interact. For instance sociolinguistic and technical factors contribute to popular impressions that ICT can be used only in ELWCs and foreign languages. The educational system, by focusing on the official languages (ELWCs) to the exclusion or marginalisation of first languages, may reinforce this notion. Policies concerning language of instruction in schools and language policy generally mandate such approaches, and economic factors (other development priorities, budget realities, costs, etc.) limit the resources that can be devoted to first language instruction. The level of literacy in first languages, then, may be a factor in manifest demand for localised software.

Another example is how the distribution of speakers of different languages within a country or region compares with access to computers and connectivity (language, technology, society, with policy and economics also being relevant). This is a concern when thinking ahead to marketing localised software.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 19

Yet another example, in the case of localisation of open-source software, is how economics and policies, as well as the quality of translations affect actual or potential competition with proprietary software.

A localisation effort might not actively consider of all these, nor have to address them, but they are part of the environment it works in. On a small level, it is certainly possible that motivated individuals and groups who have the necessary skills and at least a minimum of resources can begin and even bring to conclusion the translation of some software without deliberation over such factors. But as the goals become more ambitious and sustainable results are sought, the environment for localisation becomes an ever greater consideration, and a systematic way of looking at that environment becomes necessary.

3.3 Dynamic complexes within localisation ecology

In order to make sense of such a long array of connections, which becomes even more complex as one considers multiple factors, it is helpful to highlight several key relationships or complexes in the system. These include: language-technology-society; language policy and ICT policy (intersections and non-intersections); and factors in sustainable localisation.

Language-Technology-Sociocultural

This is the triangle already introduced above (Fig. 2) within the larger PLETES model (Fig. 4). While other factors cannot be ignored, this triangle represents the core set of dynamics of localisation: sociolinguistics (the languages that people use and how), the connections between language and technology, and the interaction of people with the technology. On the one hand, localisation as translation (language-technology) involves attention to cultural dimensions of communication and on the other, information and communication technology is developed – and localised – for people (language-technology-sociocultural). Developing keyboard layouts for instance is more than just finding a way facilitate input of the characters used to write a language, but requires also consideration of user expectations and existing practice.18

18 The marketing for the Konyin keyboard for instance includes the phrase “Does not change how you type!”No cryptic codes to remember! No training required!” - an explicit recognition of the importance of this "sociocultural" factor.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 20

Fig. 4: The three key factors of localisation in the PLETES model

Most localisation initiatives at least in their beginning are mainly focused on this set of concerns: technical process, localisation for language and cultural factors, and end users.

Applied Linguistics, Translation in Localisation, Social Uses of Technology

Another way to look at the core set of dynamics as illustrated above is as three dynamics or specialisations that often operate independently one from another: applied linguistics and sociolinguistics; the translation part of localisation; and the social applications of technology. Each of these is illustrated in Figure 5. In effect, localisation brings together aspects of all three.

Fig. 5: Applied linguistics, translation in localisation, and social uses of ICT (l. to r.)

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 21

Language Policy and ICT Policy

The model can also be used to illustrate two main policy dynamics that relate to localisation. Policy may affect any or all of the main concerns of localisers, especially ICT policy of governments and donors and language policies of governments. Language policy arguably involves mainly politics, sociocultural concerns and of course language. Other factors are implicated as well, but these seem to be the main ones. Technology policy arguably has economic concerns at a high level, both in terms of investments required and hoped for returns. An example is efforts to increase connectivity through a project like the Leland Initiative – the technical issue of expanding the bandwidth available to certain African countries was accompanied by consideration of the regulatory environment for use of that bandwidth (policy and economic considerations). Here, too, other factors are important but (arguably) not to the same degree. The two are depicted in the PLETES model in Figure 6.

Fig. 6: Comparison of main concerns of language policy (left) and ICT policy (right)

If we take this depiction of key dynamics as representative of the priorities of ICT policymaking on the one hand, and language policymaking on the other, there seems to be little intersection of the two.

Localisation work in effect appears to fall between these two important policy concerns. This may signal a need to more effectively link ICT and language policy, perhaps in "localisation policy" frameworks in ways that highlight the missing connections on each side.

This apparent disjuncture between the main factors considered in each of the two types of policy and the processes and considerations involved in their formulation would seem to leave localisers with a fair amount of leeway in their activities as well as little support.

Dynamics of "Digital Divide" Projects

Efforts to expand access to the technology arguably begin with the technology-sociocultural dimension (supplying computers and connections to communities), but quickly encounter or

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 22

expand to other concerns in policy (regulations, relation to the country’s priorities), economics (costs), and education (training, and often schools) (see Figure 7). As such they have significant overlap with the primary concerns of ICT policy. However to the extent that language is not actively incorporated into their models – and many projects are not actively engaged in this issue – there is no overlap with some of the primary concerns of localisation or of language policy.

Fig. 7: "Digital Divide" Projects – from basic to more complex dynamics, without language

Dynamics of Sustainable Localisation

The concept of localisation ecology is also useful in considering the important issue of sustainability in localisation. This in turn is operative in two more or less sequential phases. The first part of sustainability may be termed as "follow-through," that is to complete a set of localisation projects (software translation), and the second as "follow-up," or marketing the localised software, dealing with the it in the field, responding to users' reactions and suggestions. Follow-through requires attention not only to technical and language dimensions but also to other factors, of which financial ones may be of the first importance (the resource flow necessary to sustain the initial effort becomes critical when early enthusiasm encounters various limitations). Follow-up in principle entails a lot of other issues as well, as it relates to communication and marketing, user skills, and even policies. Figure 8 shows the interrelationship of factors for sustainable localisation: (1) those basic to localisation, (2) in follow-through in a localisation project (adding the economic dimension), and (3) in follow-up, which may involve all dimensions.

By looking at it this way, it would seem that localisation initiatives must be prepared to acquire various skills of non-governmental or civil society organisations in order to achieve sustainable production and results.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 23

Fig. 8: Localisation, Localisation follow-through, and Localisation follow-up (l. to r.)

3.4 Relevance to Questions of ICT and Localisation

The framework of "localisation ecology" relates such issues as mentioned above in a system that affects the potential and path of localisation efforts. On the one hand the concept and illustration of it by the PLETES model are abstractions, but on the other hand they can be used in planning long term and sustainable localisation efforts as a way of anticipating issues that may arise and identifying factors that can be important for the others that can be used to advantage.

Once one understands more fully that there are various factors affecting localisation, how then to use that knowledge? Perhaps this framework and the other contents of this report could be used in the drafting of materials and training workshops to enable localisation groups to proactively navigate the range of factors, influences and forces that affect the long term success of their efforts.

Localisation ecology may also be a way of imagining the impact of localisation itself – how does the localisation of a software application in a language affect the other relationships involved?

3.5 Localisation Ecology and the "Digital Divide"

Localisation, and the broader agenda of bridging the "digital divide" of which we believe it is an indispensable part, involves several common factors. Discussion of the divide in terms of various contributing factors is not new, but use of an explicitly ecological framework including language would seem to be productive. Part of the problem with digital divide discussions is that language as a factor in access is often not taken fully into account, even though it is the basis for communication and the "coding" for knowledge.

It would further be of interest to integrate discussion of localisation more fully into discussions of the digital divide and efforts to bridge it, and to that end, the localisation ecology concept and its illustration in the PLETES model might be especially useful.

The following sections will highlight certain of these factors and interactions, and go into more detail. In the next two sections, first the language and sociolinguistic contexts will be discussed, followed by

consideration of the technical and access contexts.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 24

4. Linguistic ContextFor purposes of this study, the linguistic context to be considered concerns the distribution of speakers of the various African languages, current trends, attitudes of social groups toward use of different languages, the effect of language and education policies on use of and change in African languages, and development of terminologies. In other words, this includes sociolinguistics, policies, geography of language, and contemporary culture. On the level of individual languages, issues of dialect variation, degree of interintelligibility with related languages, and development of terminology also are key considerations.

There are other scholarly studies of the languages of Africa from perspectives of the field of linguistics that treat many of these issues in more depth than will be attempted here. This section will attempt to broadly characterise the main elements of the linguistic environment in which African language localisation efforts are taking place, with attention to how those affect localisation.

4.1 Languages, Dialects, and Linguistic Geography

Four indigenous language families occupy the continent: Afroasiatic (formerly called Hamito-Semitic); Niger-Kordofanian (including Niger-Congo, which is by far the largest group in this family); Nilo-Saharan; and Khoisan (or Click). Within these there are interspersed subgroups, overlapping of territory by speakers of different languages, and more or less gradual gradations in dialect differences within languages. In addition there are indigenised languages from the Malayo-Polynesian (Malagasy) and Indo-European (Afrikaans) families. This creates in many areas a complex kaleidoscope of language distributions, which tends to be more intricate in certain parts of the continent such as the forested regions of coastal West Africa and Central Africa (Nigeria for instance counts more than 400 languages, Cameroon over 280, and the Democratic Republic of the Congo over 200, according to Ethnologue), and less so in savannah regions like the Sahel. Also, over vast distances where languages like Arabic, Berber languages (Tamazight, Tamasheq, etc.), Fula, and Swahili, are spoken, dialect/vernacular differentiation has occurred. In the past, communication among diverse groups was facilitated by either the interintelligibility of similar dialects in certain languages or by the use of vehicular languages.

European colonisation, which imposed arbitrary borders that split many language communities, and overlaid English, French, Portuguese, and Spanish (ELWCs) as administrative languages, added to Africa’s linguistic complexity. ELWCs assumed the role of linguae francae for official use and contact/exchange across the continent and with the rest of the world. They have also tended to divide elites from the mass of society, or at least to be used as markers of elite status (see Mazrui and Mazrui 1998). To a certain extent these languages have over the years also become vehicular in some regions and urban areas, and in some cases adopted as the home language among elites. In addition, creoles based on ELWCs have become established either as vehicular languages, such as Krio in Sierra Leone, or as the first language, such as the creoles of Cape Verde to the west and several Indian Ocean island states to the east.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 25

Languages, Dialects and "Macrolanguages"

The complex linguistic situation in Africa also raises questions about the "borders" of languages and the very definitions of language and dialect. This is a contested terrain in the study of linguistics, with those tending to emphasize distinctions among languages sometimes characterised as "splitters" and those emphasising the commonalities called "joiners."

The researchers of SIL International who have compiled the well known Ethnologue listing of languages (Gordon, ed. 2005) may fairly be considered splitters. In the language profiles section of this document, very often several Ethnologue listings for variants of a given tongue are consulted for the listings we use for discussion of localisation. By their count, Africa has 2092 languages (ibid.).

On the other end of the spectrum one might place the Centre for Advanced Study of African Languages (CASAS), which has been researching groups of languages with the idea that, functionally Africa has far fewer separate languages than is often claimed. Its director, Kwesi Kwaa Prah (2002; 2003), suggests that 75-85 percent of Africans speak as a first or additional language, 12-15 "core languages," which in fact are clusters of more or less closely related languages.

Even splitters acknowledge that there may be degrees of interintelligibility of different tongues, and beyond that sometimes the speakers of related tongues may hold ideas (one might call ideologies) of their fundamental unity. For such situations, SIL has adopted a category of "macrolanguages."19

The terminology, new and old – including such as "language cluster" (closely related but not highly interintelligible), "dialect continuum" (language variation over territory such that communication gets progressively difficult over distance), "dialect levelling" (reduction of differences within a language due to contact), etc. – reflect the complexity of the situation.

What all this means for localisers is that in some cases, perhaps many, they will have to negotiate different sets of categories in deciding what to localise for. For instance in a tongue like Fula (Fulfulde/Pulaar) that is spoken across much of West Africa, though by a minority in each country where it is present, there are clear differences among variants of Fula, but also enough similarity to permit communication by speakers of most of its different variants. What should software be localised for – one language, the nine that Ethnologue divides it into, or some set of groupings of close dialects? And what should be the approach for localised content in Fula – the same as for software or a different set of criteria?

The case of Arabic also merits mention, as Ethnologue has sixteen separate listings for northern and eastern Africa alone (the total number of listings is 40). The difference in this case is that there is an established common standard form – Modern Standard Arabic – unlike the case with many other African languages.

19 Macrolanguages, which "joiners" might in some cases simply call "languages" but which in other cases may approximate language clusters, is a category that arose in the process of reconciling different parts of the ISO-639 standard for codes representing languages.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 26

In other cases, languages that are linguistically closely related may be spoken by groups that may actually emphasize mutual differences (such as is apparently the case with Teso and Turkana).

This topic is discussed further below in the context of a system of codes used in ICT to designate specific languages or groups of languages (section 6.3).

Linguistic Geography and Localisation

In addition to the complex patterns of location of speakers of certain languages, and the overlapping of territory where various languages are spoken, the borders inherited from the colonial partition of Africa have also divided linguistic communities. The latter has in many cases led to additional changes and divergence within languages.

There has been discussion over the years of developing a linguistic atlas of Africa and of specific countries, and in some cases maps with language distributions have been produced. Such an atlas project could be incorporated into larger localisation efforts, first to understand the ranges of potential use of localised software and keyboard standards, and second to compare the language distributions with computer and internet access.

4.2 Sociolinguistics and Language Change

African societies are by and large multilingual and very often individuals in them master several languages to varying degrees, and use them in different contexts or together in what linguists call codeswitching. These societies are also seeing a significant degree of change in how people use language, including changes in urban areas, dialect levelling,20 and in the case of less widely spoken languages, impoverishment or contraction,21 endangerment, and extinction. It is not an exaggeration to say that the African linguistic terrain is experiencing many changes, with some of those working in countervailing directions.22

Of major importance is the role of attitudes and perceptions, among Africans as well as among foreigners working for development and education in Africa. It is not uncommon to encounter more or less negative attitudes concerning African languages, from their utility as mere "local languages" vis-à-vis ELWCs, to doubts about their capacity for expression of complex thought and scientific concepts. On the other hand there is commitment among many to the use and development of their maternal languages. (See further related discussion under Terminology below, section 4.5.)

20 This process involves in effect a blurring of dialect differences due to factors like marriage, movement of people, and broadcast media.

21 The phenomenon of speakers not mastering the language fully and in the extreme no speakers or group of speakers mastering the full range of the language.

22 Among recent sources that survey language change in contemporary Africa is one by Batibo (2005).

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 27

Such divergence of opinions is of course to be expected in any complex society regarding any number of topics, and to expect otherwise in the case of language is not realistic.23 Although it has never been surveyed to our knowledge, there is certainly a significant interest – perhaps latent – in localised software and content among a significant number of people.

Negative attitudes, however, represent a potential discouragement to people seeking to use the languages in various ways, including localisers. For the latter it is not so much that one needs to convince potential users who have no interest in localised software and content, but negative attitudes may discourage projects.24 In addition, negative attitudes may retard progress that can benefit potential users who are interested.

Beyond that there are other attitudes that affect localisation potential, such as the notion that use of one language precludes use of another, or that if software or web content is provided in one of the languages that many people speak, there is little or no need to think about other languages.25

Altogether, such attitudes point to an importance of education in localisation efforts. Another area of concern is the status and future of many of Africa's less widely spoken languages, a number of which are considered endangered.26 Many other less widely spoken languages, however, also are experiencing declines – or degrees of contraction – without being close to extinction.27

4.3 Language and Language in Education Policies

Policies within African countries and indeed between and among them affect the possibilities for localisation and as such represent a critical factor in localisation ecology. ICT policies are treated in the next section (5.2), but here policies related to language are considered. In particular, two overlapping policy areas are of concern:28

23 H. Russell Bernard (1996) mentions such diversity of opinions in a discussion of whether linguists should work to preserve indigenous languages.

24 The author encountered the opinion that there is "no huge demand" in Ghana for Ghanaian language interfaces or software from at least two sources. The expectation that there must be large scale demand manifest before providing interfaces or beginning localisation work for various languages fails to understand the issue of latent demand.

25 The author has encountered this attitude among some development professionals.

26 The UNESCO Red Book of Endangered Languages lists over 180 languages in Africa it considers endangered at http://www.tooyoo.l.u-tokyo.ac.jp/Redbook/Africa/AF_index.cgi .

27 There have been references for instance to Igbo – a language of Nigeria spoken by at least 18 million people – as being "endangered" based on perceptions of how the language is and is not being used and passed on (see Daily Champion 2004, Lotanna 2005). This obviously stretches the definition of endangered too far, but it also reflects popular interest and concern among many Igbo speakers.

28 A third area might be suggested in the context of localization for "digital divide" or ICT4D projects, and that is language in development more broadly. This has been treated only to a limited degree in the literature, for instance by Robinson (1996), Prah (2000), Simala (2002), and Ongarora (2002). For this study, however, language in

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 28

• Language policy and related concerns of language planning and management; of particular practical interest for localisation efforts are decisions concerning orthographies

• Education policy, with specific attention to issues of African languages as media of instruction and as subject matter.

Language Policy in Africa

Halaoui (2001) distinguishes between language policy and language management, and in fact, a thorough consideration of this subject would merit such a nuanced analysis. However, for purposes of understanding the localisation ecology, it is sufficient to treat the subject as a single problem while acknowledging an underlying complexity.

Language policy therefore is taken broadly to mean the set of legal and administrative mandates and guidelines concerning language use in public life, including such matters as: denoting particular tongues as "official" or "national" languages; the use of particular languages in government, legal systems, development, and education (the latter is discussed below under language in education policy); and standards, such as official orthographies (this is also treated separately, below).

Since independence from European colonisation, one might characterise African language policy concerns as being marked by two features: First, reliance on the former colonial languages – ELWCs – for government administration and education, whether that reliance was codified in law or constitution or not at all;29 and second, much discussion on the role of indigenous languages and how to use them.

A factor favouring use of the ELWCs was the concern in many cases of African governments with having a single a common language for "nation building." Bambgose (1996) characterises this as involving "two complementary myths: the first being that having several languages in a country (multilingualism) always divides; the other being that having only one language (monolingualism) always unites..." (see also Bamgbose 1991:14).

At the same time the discussions on national and regional levels of use of African languages has tended to result in proposals but little follow-through.30 This reflects perhaps a tension with the focus on one language mentioned above, as well as ambivalent attitudes about the value of African languages vis-à-vis ELWCs.

development will be considered as part of the larger issue of language policy.

29 Many African countries do not have a legislated official language (Gadelli 1999). This fact is borne out by a country by country research of language policy (see the site L'aménagement linguistique dans le monde http://www.tlfq.ulaval.ca/axl/afrique/afracc.htm , which was one of the references used in compiling the country profiles in Appendix 3 [12.3] of this document). This is not particular to Africa, as numerous countries elsewhere (such as the United States) have not found it necessary to legislate any official language.

30 The website of the African Academy of Languages (ACALAN) has a recapitulation of how many of the declarations and plans of action issued by conferences and meetings in Africa have not been acted on. See http://www.acalan.org .

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 29

In any event, there has been a low level of attention given to language policies and planning in Africa, such that Okombo (2001), for instance, calls it a "forgotten dimension in governance and development" (see also Gadelli 1999). Bamgbose (1991: 6) finds that this is due to "a general feeling that language problems are not urgent and hence solutions to them can wait." He goes on to characterise the situation in these terms:

Language policies in African countries are characterised by one or more of the following problems: avoidance, vagueness, arbitrariness, fluctuations and declaration without implementation. (Bamgbose 1991:111)

In recent years there has been some more attention to this area, such as in education (see below), the formation on a regional level of the African Academy of Languages, and the vigorous exploration of some of these issues in post-apartheid South Africa.

In the area of ICT and the potential for localisation, the absence of language policies that actively support African language computing means that localisation will likely depend on initiatives from individuals, organisations and companies.

Language Institutions and Agencies in Africa

Agencies and organisations for research and applied linguistics exist in one form or another in each country, often as a part of government or a university (information on these, where available, is given in the country profiles, Appendix III, section 12.3). There are also some continental and regional institutions.

On the continental level, the African Academy of Languages (or Académie African des Langues, ACALAN), based in Bamako, operates under an African Union mandate to facilitate work with African languages. The most notable regional institutions are several set up to deal mainly with oral histories. These are listed below with more information in Appendix IV (section 12.4):

• CELHTO - Centre d'études linguistiques et historiques par tradition orale (Centre for linguistic & historical study of oral tradition)

• CERDOTOLA Centre régional de recherche et de documentation sur les traditions orales et pour le développement des langues africaines (Regional centre for research and documentation of oral traditions and for the development of African languages)

• CICIBA - Centre Internationale des Civilisations Bantu (International Centre for Bantu Civilisations)

• CIDLO - Centre d'investigation et de documentation sur l'oralité (Center for investigation and documentation of orality)

• EACROTANAL - Eastern African Centre for Research on Oral Traditions and African National Languages (Centre est africain pour la recherche sur les traditions orales et les langues nationales africaines)

There are non-governmental organisations concerned with language and culture in many countries. On a continent-wide scale SIL International is the most prominent, with offices in many countries.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 30

Orthographies

One area of language policy that has received attention in many African states is that of orthographies for at least the major languages spoken by their populations. In these countries this has usually meant setting rules for transcription in a Latin-based alphabet.

The process of arriving at such orthographies has generally involved various actors beginning in the colonial period – including missionaries and colonial administrators – with the post independence states building on that history. To accommodate the phonetics of these languages, which often had sounds unfamiliar to Europeans, diacritics or modified letters (the latter corresponding often to characters of the International Phonetic Alphabet) to represent sounds not distinguished or present in European tongues An early example to standardize usage on a continental level was the "Africa Alphabet" of the International Institute for African Languages and Cultures (1930). This history of has been explored in some cases but neither extensively nor for all African languages.31

Several factors are important to note in considering the topic of orthographies for African languages today:

1. While orthographies are relatively set for some major languages, they are still apparently in flux for others.

2. One of the reasons for changes in orthographies for some languages in recent years has been a mismatch between what fonts on early (and current) computer systems offer and the characters or diacritics adopted for use in print and writing.32

3. There are in some cases separate systems of writing for the same language, sometimes with a degree of competition or even conflict among their respective advocates. This is the case for some languages with old written traditions (see below) and some that have been written only more recently.

4. There are still many languages, mostly less widely spoken ones, that do not have formal writing systems.

5. Conventions and policies concerning orthographies are generally set at the country level by governments or researchers, without much coordination with other states where the same languages are spoken.

With regard to the latter issue, it is interesting to note that following independence, a number of African countries in collaboration with UNESCO began serious consideration of aspects of language policy including the possibility of common rules for transcription of the many languages that cross borders. Among these the experts' meetings in Bamako in 1966 (which

31 See for example John E. Philips (2000) on the history of Hausa orthographies. In terms of developing alphabets for multiple languages in a country, particular note should be made of the process in Cameroon where an effort to develop an alphabet has apparently met with some success (see Tadadjeu and Sadembouo 1984; Tadadjeu 1993).

32 Roger Blench (personal correspondence, 2006) notes, for instance, that much of what Kay Williamson (1984) compiled on orthographies for several Nigerian languages may not be in current use.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 31

made reference to the Africa Alphabet mentioned above) and Niamey in 1978 (which produced its own "African Reference Alphabet") deserve mention.33

This process has yielded a certain level of standardisation which benefits current localisation efforts, at least in West Africa. There have been other efforts along these lines, such as a conference in Okahandja, Namibia in 1996 and current work in several parts of the continent by CASAS.

In most of the numerous countries where writing systems for indigenous languages existed before colonisation, Latin transcriptions were adopted "officially" in their place. The exceptions were Arabic script for the Arabic language and Ethiopic/Ge'ez for several languages in what are now Ethiopia and Eritrea. This meant that use of several scripts was marginalised from the outset. These include Arabic script (as "Ajami") for several languages of the Sahel, Yoruba, and Swahili, the Tifinagh script for Berber languages, and minority scripts such as Vai, Mende Kikakui, and Bamum. All of these continue in use to some degree but Ajami transcriptions are apparently in widespread use in some areas. However language policy has by and large ignored these practices.34

There was an effort sponsored by the Islamic States Educational, Scientific, and Cultural Organization (ISESCO) in the 1980s to develop a standardised Ajami for several languages in Africa south of the Sahara (see Chtatou 1992). This was based to a certain extent on usage in non-Arabic languages of the Middle East and has apparently not had wide acceptance in the African languages it was intended for.35

To this picture must be added African scripts of more recent origin. While some of these have become popular – the N'ko script for Manding languages has been adopted by an increasing number of people over the half century since its creation and is now included in Unicode, and the Mandombe script for Kongo has spread in the three decades since its creation – others have not enjoyed much success.36

The relationship of the issue of writing systems, orthographies and Unicode is discussed below (section 6.2).

33 Some of the documents from these conferences are available online at http://www.bisharat.net/Documents/ .

34 For instance Naira currency notes in Nigeria include the amount of the note in Hausa, written in Ajami. Up until 2007, it was the only indigenous language represented on the currency. (at the present writing, Nigeria has decided to move to Latin transcriptions of Hausa, Yoruba and Igbo on its currency).

35 There are a number of experts who are interested in new research on this topic (Fallou Ngom, personal correspondence, 2006).

36 See Appendix 2 (section 12.2) for more information on major scripts. Concerning the unsuccessful proposals, there have been for instance at least three writing systems proposed over the years for Hausa but not widely used, and in 2005 there was a retired professor in Senegal (Agence de Presse Sénégalaise 2005) and a merchant in the Gambia (Secka 2005) who each announced they had created new scripts for African languages.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 32

Educational Policy and Languages of Instruction

There is increasing attention to the issue of use of first languages in education and bilingual pedagogy in Africa, on the country and interAfrican levels.37 An extensive discussion of the rationale for first language and bilingual (or multilingual) education is beyond the scope of this paper, but it is generally agreed that this is beneficial for schoolchildren. However implementation is not simple. This is a set of issues that on the one hand can arouse some debate within countries, and on the other involves problems with teacher training, availability of materials, etc.

How this might affect localisation is another issue. One interesting case mentioned above is how Morocco's decision in 2003 to use Tifinagh in Tamazight education spurred efforts to complete encoding of the script in the Unicode standard. It may be that first language education and localisation efforts are mutually supportive, especially to the extent that ICT for education programs are introduced (such as computers for schools, the One Laptop Per Child project, or the involvement of ICT4D centres in adult literacy).

4.4 Basic Literacy, Pluriliteracy, and User Skills

Among the basic factors that contribute to the intractability of the digital divide is literacy (some others are discussed in section 5). In multilingual contexts, such as what one finds over most of the African continent, the subject is perhaps more appropriately put as "pluriliteracy" – being literate in more than one language38 – though it is seldom discussed officially in these terms. User skills in terms of literacy include several possible profiles:

• Fully pluriliterate – that is able to read and write in all languages they speak (which have a writing system)

• Literate in an ELWC but not their mother tongue – this being the usual outcome of schooling being conducted entirely in the ELWC

• School-leavers with varying but not complete levels of literacy in the language of schooling • People with little or no schooling who have learned to read their mother tongue or a local lingua

franca to some level of proficiency from literacy classes given by national programs, development projects, or traditional education (such as Koranic schools)

• Illiterates and functional illiterates.

The potential user communities of localised content and software in Africa, therefore, tend to be quite uneven in their ability to take advantage of the opportunities that these present. Moreover, this fact points to a fundamental link between education (including both literacy training and

37 It is worth noting that there have been numerous conferences and meetings over the years to discuss aspects of use of African languages in education. Two of the earliest were in 1964 in Abidjan and Ibadan (Sow 1977). Two of the most recent include one on bilingual education in Windhoek, Namibia in August 2005 and one on languages and education in Africa scheduled for Oslo, Norway in June 2006. A partial list is available at http://www.bisharat.net/Documents .

38 There are actually two terms used for this. One, "multiliteracy," is also and perhaps more frequently used to describe literacy in multiple media. The other, "pluriliteracy," has been used in some European literature in the more strict sense of the ability to read more than one language. The latter term is used here.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 33

policy as pertains to languages of instruction) and localisation. Localisation efforts may do well to associate themselves with, for instance, literacy training programs – both traditional and using ICT where public telecentres have been set up for local development – and computer for schools projects, so that students have the opportunity to encounter software and content in their first languages. In terms of localisation ecology, the factors of language, technology and education each have relevance in developing user skills.

There is also a relationship between orthography and developing reading and writing skills. This is a subject matter that is beginning to get more attention39 and has a bearing on computer use in the numerous languages for which writing systems are not well established.

All this said, literacy (pluriliteracy) rates will take time to increase in many countries of Africa. This would seem to indicate the desirability of making more effective use of audio and image in content and user interfaces.

4.5 Terminology and Accommodation of ICT Concepts

One aspect of language change and planning that has particular relevance to localisation is terminology. The broader area of terminology development concerns many fields of which ICT is only one. There is some attention to computer and internet terminology in this field, though it is often left to technical specialists and not linguists to find or develop terms necessary for localisation.

The ways that languages develop or borrow terminology for new and foreign concepts, the process Coulmas (1992) refers to as "language adaptation" involves several considerations. In some cases terms arise from out of the community of speakers, but where the technology or details of it are not familiar to most people terms are either borrowed from another language or invented, often by individuals or groups either self-appointed or designated by some authority. Most of the theory of this is beyond the concern of people who are working on translating software and need only to have ways of referring to various concepts.

Putting together terminology is a process that usually relies on experts in the language who have some familiarity with the technical areas for which the terminology is needed. Indeed, Microsoft has, for its localisation efforts in major African languages, used panels of experts to develop terminology and dictionaries.

The focus of localisation projects with respect to terminology is somewhat narrow – as it should be to address its specific needs. However the efforts of localisation initiatives should be informed by and in turn participate in larger terminology efforts. At the same time it should be noted that there is some debate among linguistic experts about the value of efforts to develop terminology for less widely spoken languages in all scientific domains.

The development of terminology for localisation is also a subset of the larger concern with dictionaries for use with software. 39 For example, see Joshi and Aaron (2005). Ethnologue has a list of resources in this topic area at http://www.ethnologue.com/LL_docs/index/Orthography(Literacy).asp .

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 34

5. Technical ContextAs part of the effort to better understand the interface of language and ICT in African societies, this section will review the technical context encountered by localisation efforts. It deals with two areas:

1. A nexus of issues relating to connectivity, infrastructure, computer hardware, and operating systems in Africa

2. International trends in two interrelated concerns – FOSS and localisation – and their impact in Africa.

These two areas might merit treatment as separate sections but for fundamental connections in (a) the key concern of "access," and (b) their relationships (actual or potential) to ICT for development programs. In effect, all efforts active in ICT in Africa, whether internal, international, or both, have an ultimate (or at least ostensible) aim of increasing access to the technology. Access, however, is a subject with broad implications.

ICT for development programs have focused in one way or another on access, and as such are often implicated in bringing hardware to Africa, increasing connectivity, and, less consistently, in FOSS and localisation. ICT for development programs and projects then are important contributors to shaping the technical environment for localisation.

With this context, this section will begin with a brief discussion of access and the factors it involves. It will then treat infrastructure issues, hardware and operating systems, and connectivity, followed by international trends in FOSS and localisation, and their bearing on Africa. Mention of ICT for development initiatives will be made as appropriate throughout.

5.1 Access: Physical and Soft

Access to ICT is a fundamental defining factor of the digital divide,40 and hence the focus of a range of activities relating to the technology in Africa. It is generally understood as involving more than mere proximity to and permission to use computers (or other devices) that are connected to the internet (or other network). Various sources have sought to elaborate levels or types of access. For instance, Telecommons (2000), in an early evaluation of the potentials for ICT for rural development discusses "'physical access' to ICT infrastructure and applications, and ‘soft access,’ which [they] define as software and applications which are designed to enable rural African users to utilise ICTs for their own needs and uses once the physical access has been established." In a broader context, the organisation Bridges.org went further to define twelve dimensions of what it called "real access"41 to the technology, of which "relevant content"

40 Access is also an important issue where disabilities are involved, but this report will not address that dimension of access in Africa.

41 This presentation is no longer on their site, but can be viewed at http://web.archive.org/web/20041119054155/http://www.bridges.org/digitaldivide/realaccess.html .

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 35

specifically mentions language. In effect several points emerge in discussing access and localisation:

• Foundations for of access – hard realities o Availability of functioning computers, power to make them run, and connections to link

them are the starting point for discussion of access o Costs of establishing and maintaining access, which are generally beyond local means

and hence often involve outside support or even initiative o Permissions to use the devices, whether by fees or other means, often represents a

significant cost relative to potential users’ resources (a potential barrier to physical access)

• Access and localisation – how providing access meets the user o Two aspects of access – software interface and interactive software on the internet – are

ones in which choice of language is important, o We can adopt Telecommons' term of "soft access" to refer to how well these anticipate

user needs o Localisation is the major part of the process of assuring soft access.

• User skills – how the user meets physical and soft access o User profiles in terms of, among other things, language and literacy o Implies attention to developing user skills, including basic literacy (see above, section

4.4)

Figuratively speaking, access builds towards the user from one end, from hardware and connections to interfaces understandable by potential users (with other factors involved also). User skills, in effect, build from the other end, involving education, training and so on. There is, then, arguably a trade-off or complementarity between soft access and user skills. The more skilled or experienced users are less needy in terms of soft access, while less skilled or experienced users require more attention, including localised interfaces.42

Localisation for enhanced soft access, and the enhancing of users’ skills, therefore emerge as two complementary and essential elements to extending physical access to ICT into effective or "real" access (which in turn imply connections to other concerns, per localisation ecology). However, it all begins with physical access.

5.2 Infrastructure

In discussing efforts to expand ICT in Africa, whether localised or not, there is often reference to other basic technical and infrastructure indicators such as number of telephone lines and level of electrification.43 In effect, the realities of poor communications infrastructure and unavailability of (reliable) power sources limit even well-funded efforts to establish access to computers and the internet. These basic factors are changing slowly, while other solutions such as alternative power sources (primarily solar) and the dramatic growth in numbers of cellphones are altering the equation in some ways.

42 It is, of course, remembered that skilled users may also have an interest in or preference for localized interfaces.

43 Other, non-technical, factors that impinge on levels of ICT usage in Africa include literacy (mentioned above, section 4.4) and income level.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 36

For purposes of this study, however, these important infrastructural factors will be considered as givens in order to give attention to other technical variables that affect actual and potential localisation in Africa.

5.3 Computer Hardware and Operating Systems

The most basic measure of ICT penetration and of the "digital divide" is the availability or not of devices in working order – primarily computers, but to an increasing degree also portable and handheld devices – that can process, store, and transfer information. In their absence of course, any discussions of connectivity, access and localisation are moot. One reflection of this fact are efforts such as the one to develop the Simputer in India44 and the recent One Laptop Per Child (OLPC) project effort to develop and supply inexpensive laptop computers to schoolchildren in poor countries.45

By any reckoning, numbers of computers in Africa are low by comparison to other world regions, and very often those machines are of older make and operating system. There are obvious economic reasons for this state of affairs. Even the efforts by outside agencies to address their perception of the this hardware side of the digital divide in Africa by supplying new or used computers have limited effect (however well designed or funded such projects might be, they are by nature addressing limited objectives within much larger contexts).

The existence in Africa of many older computers and systems also has implications for the potential use of various kinds of software and multilingual web content. In many cases, older operating systems cannot run newer software, use Unicode fonts, or take full advantage of internet connections. In time these systems will be retired, but given the persistence of the root causes for resort to used computers, it would seem likely that Africa will continue to have a high percentage of computers in use that cannot handle the latest operating systems and software. In other words, there will still be in use computers and systems that cannot take advantage of the most recent advances in internationalisation and localisation, in effect always being a step or two behind the latest technology.

This discussion has dealt mainly with computers, though the rapid spread of mobile telephony in Africa and the increasing capacity of handhelds should be understood together as a harbinger of significant changes in the way we plan for ICT use on the continent, including localisation. The potential for multilingual SMS is already being explored. Longer-term potential for more powerful handheld devices is discussed below (section 9).

5.4 Connectivity and Internet Policy

Another measure of the digital divide in Africa is the level of connectivity – the presence and quality of internet connections. 44 The Simputer project began several years ago in India as a way to address the digital divide. See http://www.simputer.org/ .

45 Spearheaded by Nicholas Negroponte and the Massachusetts Institute of Technology Media Laboratory, this project has a web presence at http://laptop.media.mit.edu/ and http://laptop.org/ .

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 37

Basic connectivity on the country level was a major focus for ICT in Africa in the 1990s. Since the success of projects like the Leland Initiative (funded by USAID),46 and the introduction of other infrastructures like those for cell phones and VSATs, attention has turned to extending connectivity from the capital cities out to other regions of African countries.

This evolution has been accompanied, and to varying degrees guided, by discussions and elaboration of policies governing the use of expanded national bandwidth. The Leland project itself involved policy prescriptions as much as it did infrastructure assistance, with the aim being to foster a sustainable organisational configuration to manage optimal use of the bandwidth. This was guided by market philosophy requiring the national level beneficiary of increased bandwidth to resell that to privately held ISPs.

More broadly, the UN Economic Commission for Africa (UNECA), through its African Information Society Initiative (AISI) encouraged countries to develop national information and communications infrastructure (NICI) plans to help them determine how connectivity was to be expanded. Altogether, Internet connectivity and policies to guide its use have been closely interrelated.

In technical terms, bandwidth in Africa continues to increase (IDRC 2005) and the number of internet connections is growing a rapid rate (USINFO 2006). This has been accomplished largely by the use of satellite uplinks in each country, however the deployment of undersea cables around the coast of the continent is also becoming a factor: In West Africa, the Sat-3/ WASC cable and in Southern and East Africa the FAST cable.47

However, despite efforts to increase connectivity to all African countries, the actual levels of connectivity between countries and within each country tend to vary significantly.

Some of the structural issues mentioned above affect the potential for expansion of access. In particular, connecting rural centres is a challenge. Some countries have implemented a phone tariff system where all parts of the country can dial-up access at the same rates, which removes one disadvantage, but still connection quality and actual accessibility may not be good.

It is in connectivity that some of the clearest connections can be seen between policy and technology (per localisation ecology and the PLETES model).

This is important to consider since the patterns of language and connectivity/access may indicate priority areas for localisation, and the issue of access links to the motivation for undertaking localisation.

46 The Leland initiative "Africa Global Information Infrastructure Project" formally began in 1995 with a target of extending "full internet connectivity" to 20 or more African countries. See http://www.usaid.gov/leland/ .

47 The Balancing Act (2004-2005) reports on the internet in Africa discuss these cables as does the IDRC (2005) Acacia Atlas.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 38

5.5 Trends in Localisation

As discussed above (2..2), localisation as an interest began probably the moment that people started to use computers to store and manipulate human language. With the advent of personal computers and their spread throughout the world, and also with the growth of the internet, various challenges of localising for specific languages – and indeed of internationalising ICT to facilitate that have arisen and been dealt with(see below, section 6).

Today, localisation is the focus or a concern of a wide range of commercial and voluntary activities. Localisation of internet content is particularly prominent for major international languages, although apparently not as developed for African languages (see below, 7.1).

Localisation of computer software has various aspects and some complexities. First of all, one might suggest that in this area we are really speaking about four different things:

1. Operating systems (Windows, Unix, Linux) 2. Software application or package completely translated (such as OpenOffice, Microsoft Office, or

specialised programs such as web browsers) 3. Partial package, in which only the most frequently encountered commands are translated (this is

the strategy of the Microsoft Language Interface Project [LIP]) 4. Production tools for the target language without commands or help files translated (i.e., only

language settings, input methods [keyboard layout], spelling dictionaries) (this may be seen as an extension of the category "Equipping Systems" discussed above, 2.2)

Numbers 2-4 represent different levels of localisation of commonly used software. Most of the focus now with regard to African languages is on these levels. While operating systems (#1) are translated into major international languages, relatively little is being done for African languages.

Although Microsoft has increased the number of languages it offers its software in, and has announced projects for others, there is still a lot more localisation in FOSS. The OpenOffice suite is localised or being localised into about 90 languages currently.48

Other proprietary general office software such as Corel WordPerfect has focused on major languages with perhaps production tools in less widely used ones.

The dynamics of evolution of increased software localisation has thusfar been driven mainly by improved internationalisation (mentioned above) and increased demand in major markets. With regard to text, progressive improvements in the ability to render text in diverse writing systems, notably the single standard for all scripts – Unicode – facilitates multilingual production and use of content (Unicode and scripts are discussed further below, section 6.2). Also, as the linguistic background of users diversifies (as ICT becomes cheaper and more readily available in more places and forms), the demand for diverse language interfaces is manifested. Along with the

48 See http://l10n.openoffice.org/languages.html. In general, open source software and operating systems have been localised to a greater degree than proprietary software (see "Open Source's Local Heroes." The Economist 4 Dec. 2003).

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 39

latter, the potential pool of contributors to fill that demand for localisation (including where necessary developing terminology) expands.

Localisation has also been undertaken by various companies and projects internationally for their specific uses. On the other hand, localisation has not generally figured prominently in ICT for development projects in Africa with some exceptions.

5.5 Trends in FOSS

FOSS deserves specific attention in discussing the technical context of localisation in Africa due to its demonstrated potential for localisation in diverse languages. This in turn is due to cost advantages and the accessibility of its code.

FOSS includes a range of applications from servers to common "office"-type software (word-processors, spreadsheets, etc.) to more specialised programs like geographic information systems (GIS). But FOSS is as much a movement49 as an approach to developing and marketing software. The FOSS movement has spread in Africa and the Arabic-speaking world much as it has elsewhere, although the extent of use is less, which would reflect the smaller overall user communities. Nevertheless, there exist in many countries of the continent, FOSS user groups.

With a large, growing and diversifying base of enthusiasts internationally it presents an unusual range of possible support as well as some challenges for tapping it. (One hope of this PAL project is to facilitate communication and coordination of efforts among FOSS localisers in and interested in Africa.)

FOSS and some support from major corporate entities (perhaps ironically) such as Sun Microsystems and IBM. The initiative to develop the popular Debian Linux-based system, "Ubuntu," was funded by the South African entrepreneur, Mark Shuttleworth.

All that said, and despite the potential for localisation, FOSS in Africa, as in much of the global South, is most often conceived of as a way to reduce costs and dependence on proprietary software (notably that of Microsoft). The tie-ins of FOSS with localisation in Africa, however, so far seem to have been somewhat slow to develop (with a few notable exceptions, such as Translate.org.za in South Africa). It seems that at least initially, various country-level FOSS associations mentioned above are focused on promoting use of software such as OpenOffice in the official languages. Regional groupings also reflect this.

So, while FOSS and localisation seem to be meeting to an ever greater degree on the international level, one question then is how to involve the FOSS communities in Africa to a greater degree in the localisation processes in their respective countries and regions.

49 A brief internet article entitled "A Brief History of Free/Open Source Software Movement" at http://www.openknowledge.org/writing/open-source/scb/brief-open-source-history.html gives some background.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 40

Tools to Facilitate Localisation of FOSS

On a technical level, the potential for localisation of any software, especially for smaller language communities, is limited by the number of people with the requisite knowledge of programming and the languages in question. One tactic for facilitating localisation, then is to provide tools designed to make it easier for potential localisers who may have the motivation and the language skills, but not the technical background to translate software.

An example is that Dwayne Bailey of Translate.org.za and Javier Sola of Khmeros have developed an interface called "Pootle" to facilitate localisation of OpenOffice by people without high levels of technical expertise.50 The object in effect is to "lower the bar" of entry into localisation such that people with language expertise but not much technical or programming background can undertake localisation projects.

This approach has also been undertaken in other contexts such as the compiling of locale data in an online tool. It will be interesting to monitor the effect of opening aspects of localisation to people with language skills and FOSS enthusiasm, but without high computer specialisation.

50 See http://translate.sourceforge.net/wiki/pootle/index?DokuWiki=04b0203d28c36961cbc8160b690994ba .

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 41

6. Africa and the Internationalisation of ICTThere are a number of important issues for localisation in Africa relating to the internationalisation of computers and computer systems. Internationalisation refers to the process of improving computers, systems, and internet protocols to be able to accommodate the diverse language needs of the world. As such, it enables localisation and computing in many languages. Related to this is the process of setting standards.

This section discusses several aspects of internationalisation and its implementation, which are important to software and internet localisation (and their use) in Africa. These aspects include: the role of standards in facilitating localisation; Unicode and handling text; keyboard and input systems; language codes and locale data; internationalisation and the Web; internationalised domain names; and other applications.

6.1 The Facilitating Technical Environment

Internationalisation and international standards may be seen as defining the technical environment for localisation and multilingual ICT. Within the context of localisation ecology they may also be understood as technical and policy related approaches to organising language use in ICT. These factors are in continuous change and evolution. Understanding them is essential to full consideration of localisation issues.

Standards enable interoperability, and they provide a predictable environment for users. In the former sense they are part of internationalisation, such as for instance the adoption and progressive additions to the "universal character set" – Unicode (see below, 6.2).

In addition to Unicode, there are other standards that were adopted for various reasons, which enter into expanded use with localisation. One example is the set of codes for languages codified under successive ISO-639 standardisations (see below, 6.4). Others include country codes, most notably two-letter country codes (ISO-3166),51 and four-letter and three-number codes for writing systems (ISO-15924).52 There exist also guidelines for use of these codes in computer applications and internet content, notably RFC-4646.53 Among other things these codes are used in defining locale data (see below, 6.4).

There is also a set of standards for computer keyboards – ISO-9995 – that includes guidelines for keyboard layouts (see below, 6.3)

51 There are also three-letter and three-number codes. See http://userpage.chemie.fu-berlin.de/diverse/doc/ISO_3166.html . In addition there is a draft for country territories. See http://en.wikipedia.org/wiki/ISO_3166 for more information.

52 See http://www.unicode.org/iso15924/iso15924-codes.html .

53 See http://www.ietf.org/rfc/rfc4646.txt . Earlier versions were RFC-1766 and RFC-3066.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 42

Internationalisation also helps provide the predictable environment for users, a process which, taken a step further, involves aspects of localisation. The latter involve standards from orthographies, to terminology, to keyboard layouts which are or would be set on language, country, or regional levels.

6.2 Handling Text: Unicode and Complex Script Requirements

Orthographies and writing systems used for African languages were discussed above (section 4.3). Representing these on computers and the internet has presented some challenges when extended Latin and non-Latin scripts are involved. This has in principle been resolved with Unicode, but there are still issues with that standard that are being worked on.

Background

Originally, character encoding for computers used a 7-bit system (128 codepoints, or spaces for letters or other information), the most commonly known version of which is the English-based ASCII.54 ISO-646 incorporated this and defined some additional uses for other languages. A later 8-bit encoding (256 codepoints), sometimes called extended-ASCII or ANSI,55 provided more spaces for diverse characters and alphabets.

The earliest way of accommodating diverse script needs involved creating fonts in which some of the characters in another character set (ASCII or ANSI) were "changed." In other words, this meant assigning new characters to codepoints usually used for the characters that usually occupied these spaces.

In response to use of more languages in ICT, a series of standards were developed under ISO-8859.56 Microsoft developed some similar character sets such as Windows-1252 for Latin and Windows-1256 for Arabic. These use 256 codepoints (8 bits or 1 byte), of which the lower 128 (0-127) are identical with those of ASCII and the upper 128 (128-255) are different. However there was never any commercial or industry standard (e.g., in ISO-8859) developed specifically for any sub-Saharan African language or group of languages.

On the other hand, there was another standard devised in 1983 for African languages transcribed in extended Latin alphabet – ISO-6438 "African coded character set for bibliographic information interchange" – but this was apparently little used, even for the primary purpose indicated in its title.57 Curiously, although the time of its development was about the same as that 54 American Standard Code for Information Interchange.

55 American National Standards Institute. ANSI is a bit of a misnomer as the institute never formally adopted drafts of this standard. Nevertheless they were used as "Windows ANSI" and the term is commonly used.

56 There are fifteen in all. See http://en.wikipedia.org/wiki/ISO_8859 ; an older detailed description is available at http://czyborra.com/charsets/iso8859.html .

57 ISO-6438 is copyrighted and not available for viewing online. Uncopyrighted versions can be viewed at http://www.itscj.ipsj.or.jp/ISO-IR/039.pdf (before ISO-6438 was adopted in 1983) and http://anubis.dkuug.dk/jtc1/sc2/open/02n3129.pdf (after 1983). (NB- This is not in current use.)

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 43

of the "African Reference Alphabet" of Niamey in 1978 (see above, 4.3), it appears that the two were developed separately.58

African language transcription needs on computers (generally PCs running DOS or later Windows) were therefore initially fulfilled by the development on the local level (and in some cases outside of Africa) of various 8-bit encodings for fonts – often several in each country – in which varying sets of "unused" or little used characters in other encodings were substituted with the needed extended characters and diacritics.

The Unix-based Macintosh computers followed a separate evolution from ASCII to Unicode over nearly two decades (Macintosh Character set, MacRoman, WorldScript). Many users found that the Macintosh systems facilitated their work with African languages (and various non-Latin scripts), but this apparently did not have much impact in Africa where Macintoshes were and remain relatively rare.59

Arabic, as an international language of high religious and political importance, received a lot of attention early in the process. The script presented challenges from the point of view of script direction (right-to-left with numbers going left-to-right) and changing forms of many characters when preceded or followed by other characters. Nevertheless, coding followed an evolution from an early 7-bit version to 8-bit encodings including ultimately ISO-8859-6 and Windows-1256.60

Computing in the Ethiopic/Ge’ez script (which has over two-hundred characters representing syllables) was the focus of efforts dating back to the early 1980s but by the late 1990s there were apparently a number of mutually incompatible encodings in use in both Ethiopia and Eritrea. There were at least three major approaches to coding Ethiopic/Ge’ez, including using limited character sets and using up to 4 fonts.61 There was no standard prior to Unicode and this legacy persists today despite the availability and increasing use of Unicode encoding (much as it does for other languages in extended Latin scripts).

Unicode

Unicode / ISO-10646, the single encoding standard for all the world's scripts, incorporates all the characters in previous standard encodings and is designed to facilitate use and exchange of text in any writing form across all platforms and the internet. Unicode in principle can define up to

58 Although largely the same, there were a number of differences in character form between the two. See for instance http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=AfrGlyphVars ; also footnote 65. The forms in ISO-6438 were retained in Unicode.

59 One exception is the Senegalese non-governmental organization ARED, headed by Sonja Fagerberg-Diallo. An early example of the kind of use possible with Macintosh computers of that era was a learning manual for the Pular of Guinea in the extended-Latin orthography that Dr. Fagerberg-Diallo produced in 1986.

60 A web-based presentation entitled "Arabic on the Internet" (2004) gives a succinct history of this development along with other information. See http://baheyeldin.com/arabization/introduction-to-arabic-on-the-internet.html .

61 Information from Daniel Yacob (personal correspondence 2006).

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 44

one million characters, though its latest version (Unicode 5.0, 2006) covers all major (and many other, but not all) writing systems with about one tenth that number.

Unicode is commonly implemented in UTF-8,62 which permits Unicode to be used in many cases with as few bits as pre-Unicode encodings.

Unicode and Africa

Since many languages in Africa use either extended Latin alphabets or non-Latin scripts (and sometimes both) this would seem to be a natural for the continent. However there are at present several holdups.

First, although industry is moving to Unicode, and indeed systems have for a long time been designed with the standard in mind, Unicode does not seem to be well understood in Africa, even among computer experts. Many technical experts are occupied with work involving only major international languages (including Arabic), and African language experts, to the extent they work with computers, often still resort to the panoply of legacy 8-bit encodings mentioned above.

This is gradually beginning to change as newer computer systems come into use, discussion of multilingual computing increases, and efforts to facilitate the use of Unicode train more people (among the latter, the effort of the French-funded project RIFAL63 to help national language agencies in West Africa migrate their text banks to Unicode deserves note).

Second, there has been some concern expressed in Africa about how well the Unicode standard meets their needs, mainly with regard to use of diacritics. (This is discussed in detail below.)

Another issue raised was a question about whether the disk size requirements of text in Unicode relative to 8-bit fonts are a disincentive for its use (see Paolillo 2005:47, 72-73). In reality this is not much of a problem if any, given technical advances in handling Unicode (such as UTF-8) as well as the vast increases in disk space and computer memory to meet much larger file requirements (image, audio, etc.).64

Unicode and Diacritics in Latin Transcription

While Unicode in principle meets the transcription needs of all languages written in the Latin alphabet and its variants, there are a few issues that are still being discussed. Some of these have to do with individual characters and for those, there is an established system for adding characters or modifying certain information.65 However the decision of Unicode in the late 1990s to rely on "dynamic composition" to render diacritic characters by combining base characters 62 Unicode Transformation Format. There are also other UTFs, such as UTF-16 and UTF-32 (the number indicates the number of bits). Some background is given at http://en.wikipedia.org/wiki/UTF-8 .

63 Réseau international francophone d'aménagement linguistique (International Francophone Network for Language Management). See http://www.rifal.org .

64 The author is indebted to Mark Davis, Doug Ewell, and Steve Summit for their clarifications on this matter on the Unicode list (September 2006).

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 45

with one or more "combining diacritics" rather than add more "precomposed" characters for all combinations used has raised some questions.

Some language experts familiar with Unicode in Africa or working with African languages have expressed concern about this issue, while others have suggested that the issue is lack of understanding about how Unicode works.

The issue of how to deal with diacritics in some African orthographies has received varying amount of attention since the late 1990s. For instance, perceiving a slow pace of progress on support for dynamic composition in Windows systems, and less interest among developers of Macintosh and Linux systems, the Linguistic Data Consortium of the University of Pennsylvania (US) in the late 1990s launched a project to compile a list of character needs for African languages with an eye towards determining the potential for developing alternative 8-bit standards. Under the name of African Language Resource Council (ALRC), this effort was abandoned after a couple of years in large part due to advances in the field.

A similar concern, coupled with the thought that reliance on dynamic composition might disfavour African languages, motivated another project by BPI in Canada to develop a set of 8-bit fonts for African languages, an effort that was recognised at the "Internet: Bridges to Development" conference in Bamako in February 2000 (see Bourbeau and Pinard 2000).

In the "prepcom" held in Bamako in 2002 for the first World Summit on the Information Society (in held in Geneva, 2003), this situation was brought up again, with the suggestion that a series of 8-bit fonts might lead to the adoption of some new standards for Africa in the ISO-8859 series.66 The concern was expressed that Africa had lost out in the Unicode process when Unicode decided not to add more precomposed Latin characters before the needs of African languages were fully addressed.

More recently the concern was reframed in a paper delivered at the Unicode conference in 2005 as one of handling data in African languages that use characters in combined forms (Chanard 2005). The issue here was partly that of long term implications of using composed characters.

There are three sets of observations to make in response to this persistent line of concern. First, in all of this, however, there has never seems to have been a thorough assessment of the actual needs. The closest may have been a set of characters compiled as part of the ALRC effort and some research done by John Hudson for Microsoft. (SIL would probably be capable of making such a global summary from its work in various offices based around the continent, but to our knowledge it has never done so.) In any event, most of this research is based on what linguistic articles, dictionaries, and the like indicate as characters and combinations "that are used in" 65 A recent example was the sample glyph used for the upper case Y with hook (used for the ejective y sound in Fula and Hausa), in which the side on which the hook is shown was changed to reflect local usage in West Africa. A discussion of this aspect of this character can be read at http://scripts.sil.org/HooktopYVariants . This was apparently an inheritance from the divergence years before between what the current practice was in Africa (as reflected in the African Reference Alphabet) and the glyph form retained in ISO documents (per ISO-6438). See http://en.wikipedia.org/wiki/%C6%B3 .

66 See http://www.bisharat.net/Documents/Bamako2002-workshop.htm .

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 46

whatever language. In many cases the tone markings indicated for a language in such works are used only for specialised texts where clarity is essential or learning materials where guidance on pronunciation is important. So, it is not clear how much of a problem there actually is.

It is of interest to note that at least one effort, the Unicode and IDN (Internationalised Domain Names) Project intends to survey African character needs (see below, 6.6).

Second, the technology for handling dynamic composition has evolved significantly. The means the ability to position diacritics correctly over base characters and the possibility of using a base character plus a combining diacritic to render a precomposed glyph go a long way to obviating concerns about lack of precomposed characters.

Third, there is also the perspective that the objections to the use of combining diacritics are based on inadequate understanding of the technology.

Unicode and Non-Latin Scripts

Among scripts of Africa, Arabic and Ethiopic/Ge'ez (used for Amharic and Tigrinya, among others in the Horn of Africa) were encoded in Unicode early in its development. Like the Latin script, these both include extended ranges with characters for languages other than the main ones used with the main languages written in them.

Two other African scripts – Tifinagh (used in Berber languages) and N'ko (used mainly for Manding languages) – have been added in the last couple of years and are part of the 2006 release of Unicode 5.0. N’ko however includes diacritics for tones and this involves a dynamic composition that is not yet supported.

Other writing systems are being worked on, notably Vai (used for the Vai language in southwestern West Africa) and the process of attending to such minority scripts is being guided by the Script Encoding Initiative at the University of California at Berkeley. These scripts have value for several reasons but are not used by large populations.

6.3 Keyboards and Input Systems

Computer keyboards followed in general design typewriter keyboards that were originally designed for the same languages that ASCII and ANSI supported. With script requirements beyond these in computing, methods for facilitating input have had to be devised.

The operation of computer keyboards however happens on more abstract levels than a mechanical typewriter, although its functioning appears to a user just as much tied to the letters indicated on the keys as is the case with typewriters. The configuration of keyboard layouts as a part of software design is one of the reasons for this. Nevertheless, the software of the "keyboard driver" can be written or adapted so as to yield any particular character for any key. This in turn can be done in several ways: by the user in changing the commands or shortcuts for individual keys on the computer they are using; by a anyone with a keyboard layout program such as Tavultesoft's "Keyman" or Microsoft's "Keyboard Layout Creator" (MSKLC) that is designed to

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 47

be used with other software; or by a software programmer or localiser in setting the parameters for the keyboard (including possibly providing options for the eventual user) in the software itself. None of these require any particular attention to what is printed on the keys of the keyboard, though commercial software companies and vendors of computer hardware naturally find it in their interest to coordinate with some kind of standard for languages of major markets.

Another approach exists that involves developing production keyboards (the physical keyboard hardware) and keyboard driver (perhaps with fonts also) for one or more languages. The Konyin keyboard for Nigerian languages is an example.

The following deals with all of the above except for the first (that is, the modification of individual key commands).

Keyboard Layout Creation

For languages with extended Latin or non-Latin scripts, but without any kind of pre-computing input model, it is relatively easy to set keyboard shortcuts or design keyboard layouts. In fact the existence of programs such as Keyman and MSKLC make it easy for anyone so disposed to design and share a particular layout.

The design of a keyboard layout, when starting with a pre-existing model (such as the English "QWERTY" or French "AZERTY" keyboards) actually begins with a choice of what letters to retain as well as what to add. Keyboard designing programs also allow options of setting deadkeys. The issue of keyboards is complicated a bit by the fact there are at least three levels of consideration in their design, and within those several alternative solutions that can be followed:67

1. General approach to providing for keyboard input of extended characters and diacritics

1.1. Substitution, meaning that a key is reassigned.68 This basically means that one has to change keyboards for each language used. There are two kinds of substitution:

1.1.1.Keys for letters "not used" in a particular language are reassigned to characters or diacritics that are used in the target language, but not in the language the keyboard was designed for. In the case of non-Latin alphabets, this may be all the alphabetic keys.

1.1.2.Non-alphanumeric keys on the original keyboard reassigned to letters in the target language 1.2. Key combinations (also called modifier keys), meaning use of two keys, usually Alt-, Ctrl-, both

or AltGr- keys plus another, usually letter key, that together yield something other than what is assigned to the letter key alone. In some cases like the Konyin keyboard, there is a special key that functions as AltGr.

1.3. Key sequences

67 This outline benefitted from information from Cunningham (personal communication, 2006) and Hoskins (2003).

68 An example of different assignment of keys is the set of differences between the QWERTY and AZERTY keyboards. The placement of the A, Z, Q, and W keys, among others, differ between the two layouts. Similarly, one can, in a keyboard driver, reassign keys without changing what is printed on them in a customised keyboard layout.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 48

1.3.1.Deadkeys, meaning keys that when struck yield no character but when another key is tapped yields a character or diacritic that does not appear on the keyboard. This is the feature for instance in the Windows "United States International" keyboard option for accents (e.g., the apostrophe, double quote, and circumflex are deadkeys, yielding for instance accented vowels when followed by a vowel). This approach works only where the pair of keys will yield a precomposed character and not where two characters (combining diacritic on base character) are involved.

1.3.2.Operator keys, meaning that accent keys are added after the base character (in effect the opposite of deadkeys). This solution is useful for combining diacritics.

1.4. Combinations of the above. 2. Placement or assignment of keys for individual languages. 3. Providing for multiple languages in single layouts for countries or regions of the continent.

In general it seems that keyboard designers take the approach under #1 that they think is best – opinions and preferences vary – and in general focus on #2. A case could be made that there needs to be more consensus in choices under #1 and that in multilingual societies such as those of Africa, the aim needs to shift to #3 as a strategy. On the other hand, efforts to devise keyboards to accommodate many languages can produce layouts that are very complicated.

Keyboard Design and Standards

Designing keyboards to meet diverse language needs, as part of localising software or creating products, can be as simple as the above but also connects with the larger concern of standards. Standards benefit both localisers and ultimately users, by defining and meeting expectations – in other words, creating a predictable environment for programming, localising, and computer use.

For the proposal and implementation of standard keyboards for a given situation (language or group of languages), there is an international set of guidelines, ISO-9995.69 Among other things, it indicates that a keyboard has three groups of key assignments:70

• Group 1 is the basic layer with a base and shift (lower and upper case) • Group 2 is the national layer with a base and shift. There is a locking shift to access this. • Group 3 allows for supplemental characters to be entered. This is a single plane and uses a non-

locking shift.

Any longer-term strategies for keyboard development would have to consider these guidelines as well as the needs of the languages and expectations of intended users.

Alternative Input Methods

There are also alternatives to traditional keyboard that are in use internationally. These include: graphics tablets as keyboards or with handwriting recognition; virtual keyboards onscreen; LED keyboards that display the active characters in the keys themselves; and speech-to-text. These are briefly discussed below. 69 See http://en.wikipedia.org/wiki/ISO/IEC_9995 .

70 This description is from a webpage on the IBM site entitled "Globalize Your On Demand Business," http://www-306.ibm.com/software/globalization/topics/keyboards/iso.jsp .

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 49

The use of a graphics tablets for text input can be done two ways. One is to make a keyboard template and corresponding software such that touching the locations indicated for each character produces the intended character. Another way to use a graphics tablet for text input is with handwriting recognition.

Virtual onscreen keyboards are another option, but have some limitations. Virtual keys for special characters in interactive web applications such as forms or email are fairly commonplace, but their use for African languages does not yet seem that widespread (these were used in the African language e-mail sites mentioned above).

A more promising long-term keyboard solution for multilingual computing in African and the world is one with backlit keys that indicate the assignments of keys in a particular keyboard selection and can in principle accommodate and display any keyboard arrangement. This is an emerging technology being pioneered by Art Lebedev Studios in Russia under the name "Optimus."71

Speech recognition technology and its use in speech-to-text (STT) applications has interesting potential for input of text. STT accuracy has become rather good. A noted commercial STT software for English, the "Dragon NaturallySpeaking" program of Nuance, demonstrates its potential.

6.4 Languages, ISO-639, and Locales

Languages can be identified in documents on the web using certain codes, and software can be designed to insert these codes when saving in HTML. The most important of these codes are defined in ISO-639. There are also supplementary language tags defined by the Internet Assigned Numbers Authority (IANA). In addition, locale information, using ISO-639 language tags and other information, facilitate localisation. These, their relevance for Africa, and issues they raise about how to define "language" in various ICT applications are discussed below.

At the current writing, there are two international standards approved by the International Standards Organisation for identifying languages – ISO-639-1 two-letter codes and ISO-639-2 three-letter codes – one "draft international standard" ISO/DIS-639-3 of three-letter codes for all language categories identified by Ethnologue, and three more parts in formulation (see Table 2).

ISO-639- Description Status Reference site

1 2-letter codes for languages

Existed for several years; formally adopted in 2002

http://www.loc.gov/standards/iso639-2/php/English_list.php

23-letter codes for

languages & collections Adopted in 1998http://www.loc.gov/standards/iso639-

2/php/English_list.php

71 According to the website at http://www.artlebedev.com/portfolio/optimus/ : "Every key of the Optimus keyboard is a stand-alone display showing exactly what it is controlling at this very moment."

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 50

33-letter codes for

individual languages (exhaustive)

"Final Draft International Standard"

http://www.sil.org/iso639-3/codes.asp

4 Guidelines & principles for language encoding

Planned http://en.wikipedia.org/wiki/ISO_639-4

53-letter codes for language groups Planned http://en.wikipedia.org/wiki/ISO_639-5

6 4-letter codes for language variations

Planned http://en.wikipedia.org/wiki/ISO_639-6

Table 2: ISO-639 categories for identifying language, current and planned

This set of standards serves several purposes, including identification of the languages of web content and the selection of the appropriate locale information, where it exists. There is a certain apparent redundancy in ISO-639-1 and -2, which is explained by their roles for terminology and other uses. In brief, ISO-639-1 uses two-letter codes, which mathematically provides a number of identifiers far too few to accommodate the world's languages. ISO-639-2, which uses 3 letters, overcomes this shortcoming.

Several African languages (or clusters of languages) including Arabic have ISO-639-1 two-letter codes. A larger number have ISO-639-2 three-letter codes. There does not appear to have been any strict methodology applied in choosing the language categories, as the first two parts include individual languages and categories that group closely related tongues. Moreover, with the advent of ISO/DIS-639-3, which adopts the methodology and categories of Ethnologue's list of languages, a different set of criteria has been introduced into the process.72

The original impetus for ISO-639 was a need for coding for library purposes, and this is reflected in a the presence within ISO-639-2 of bibliographic and terminology codes. The latest instalment - ISO/FDIS-639-3 – uses SIL’s and Ethnologue's criteria in attempting to account for all languages. So among other things there is also a problem with the "articulation" between the ISO-639-1 and -2 on the one hand and ISO/FDIS-639-3 on the other. The latter system codes separately what in some cases the former code as single entities. The category of "macrolanguage" has therefore been adopted for the several ISO-639-1 and -2 language categories that correspond to several languages as defined in ISO/FDIS-639-3.

At some point it would be desirable to consider a more systematic approach to selecting codes for African languages and clusters, perhaps in the process of discussing parts 4-6 of ISO-639. This might optimally involve linguists specialised in African languages as well as perhaps the

72 Each of the profiles in the Major Languages section of this report (Appendix I) includes information on ISO-639 codes for that language.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 51

African Academy of Languages (ACALAN). Indeed, since we are talking about international standards affecting Africa and a number of African countries are affiliated with ISO (ISO 2006), it would be ideal to have at least some of those countries participate in the process.

Locale Data

Locale data is essential for certain uses of languages in computing and on the internet. Texin (2006) describes a locale as "a mechanism used in the Web, Java, and many other technologies to establish user interface language, presentation formats, and application behaviour." It is in effect another way in which internationalisation of ICT facilitates localisation.

A locale consists of basic information on certain needs and preferences, such as character ranges in Unicode, that are necessary to display text in the language, sort order, currency units, day and date format, and decimal markers. Completing a locale and filing it with the Common Locale Data Repository (CLDR) - managed by the Unicode Consortium73 is a necessary step in localising ICT to a language. Commonly, local data is indicated for a language and a country, using ISO-639 and ISO-3166 codes. Presently there are relatively few languages in Africa with locale data (see below, 7.4).

Filing a locale depends on use of language codes (under ISO-639), country codes (ISO-3166), and writing system codes (ISO-15924).

6.5 Internationalisation and the Web

Along with other effort to facilitate use of languages in ICT are efforts specific to, or with special relevance for, the internet. The UTF-8 implementation of Unicode (mentioned above, 6.2), for example, is increasingly used on for multilingual web content and email.

The World Wide Web Consortium (W3C)74 sets standards for the mark-up of webpages to facilitate, among other things, diverse language content.

There are also discussions about the way that the web is used that have implications for internationalisation and localisation. For instance there has been for several years discussion of how the Web may evolve organically into something called the Semantic Web with certain characteristics facilitating search, linking, and manipulation of information. A little more recently, discussion of Web 2.0 has organised thinking by some experts and commercial interests

73 See http://www.unicode.org/cldr/ .

74 See http://www.w3.org/ .

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 52

in new ways the Web can and indeed does function and serve needs in increasingly interactive ways.

6.6 Internationalised Domain Names

While Unicode and the development of means to render complex script requirements in principle permit content in any language on the internet, another consideration is the names of domains in diverse languages and writing systems. There has been interest in multilingual or internationalised domain names for Africa for some years, as evidenced for instance by the formation of an African chapter of the Multilingual Internet Names Consortium (MINC) called AfriMINC.75

Recently a project with backing by ACALAN, UNDP and the Agence Intergouvernementale de la Francophonie has taken up the issue at a time when the international discussions have become more serious.76

As of this writing ICANN has announced that it will be testing alternate ways of handling internationalised (multilingual) domain names in non-Latin scripts such as Arabic on a larger scale than what has previously been done.77

6.7 Other applications

A number of other ICTs bear mention in considering the broader context of internationalisation and technologies and applications of potential importance in localisation. These include: mobile technology; audio-related technologies; geographic information systems; and translation tools.

Mobile technology

Mobile technology - cellular phones, handheld computers, etc. - is a rapidly expanding set of devices reflecting ongoing advances in technology that permit smaller devices to do cheaply what used to require larger devices to do. On the cheaper end of the range of mobile devices, simple cellphones have become much more accessible to people with lower incomes in the global South. This market has been attracting investment and increased interest in localisation.

On the higher end, the promise of the Simputer model of relatively inexpensive handheld computing has not been realised, but with ongoing miniaturisation of the technology, future possibilities may yet exist.

75 Its director, Dr. Nii Quaynor of Ghana, also served from 2000-3 as At-Large Director of ICANN.

76 An organisational meeting was held in Dakar on 7 September 2005 to launch this effort. Mouhamet Diop of the Senegalese company Next SA organised the meeting.

77 This involves testing of two main alternative ways of handling non-ASCII characters and scripts (Crawford 2006).

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 53

Audio Dimensions: Voice, Text-to-Speech (TTS), and Speech Recognition

The transmission, manipulation, and transformation of the human speech is something that would seem natural in cultures often described as oral.

Some of the audio technologies are not terribly popular in the technically advanced countries. Audio e-mail or voice e-mail (sometimes v-mail), for instance, never seems to have taken off there and has had limited use elsewhere. At this point, with the technology moving on and various uses of voice over the internet possible with VOIP and mobile devices, other possibilities could be explored – perhaps focusing on voice commands.

Also, combinations of audio, image, and text could be very useful for learning as well as anticipating users with lower literacy skills.

TTS is of obvious interest in settings where people with access to the technology cannot read for one or another reason. STT is also of interest (this was mentioned under alternative input methods, above 6.3)

Geographic Information Systems (GIS)

Although GIS might seem to be too specialised for consideration in a general survey of localisation, there are several reasons why its expanded use worldwide should be of interest to localisers in Africa. GIS technology is becoming more accessible and there are efforts to use it in participatory analysis and planning for local development. Spatial imaging is an ideal tool in land and natural resource planning on the local level, as it is readily understood by even illiterate people, but also permits very sophisticated layering of information and analysing of data. In fact there is serious effort in various parts of the global South including Africa to combine use of this technology with established participatory research methodologies such as participatory mapping, in what is known as (public) participatory GIS (PPGIS or PGIS).78

The commercial GIS software marketed by ESRI is considered by many to be the industry standard. There also exist a number of FOSS GIS applications79 among which one called the Geographic Resources Analysis Support System (GRASS) is particularly noted.

GIS standards are governed by the ISO-19100 series (concerning mainly standards for geographic data exchange).

Machine Translation (MT) and Translation Memory (TM)

The ability to transform thought in writing or speech from one language to another with the assistance of a computer is one of the most interesting uses of ICT in multilingual contexts, but one that has had relatively little attention in Africa. The technology in this area is evolving quickly and has connections with and implications for localisation work.

78 See http://www.iapad.org/ .

79 A list of such resources is available at http://opensourcegis.org/ .

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 54

For convenience one might divide it under two headings: machine translation (MT), or the automatic translation between languages by a computer program, which aims at translating speech or text from one language into another, in general or specific settings; and translation memory (TM), which is mostly used as a tool to facilitate new translations based on previous translations of the same or similar text content.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 55

7. Current Localisation ActivityThis section provides an overview of the current state of localisation in Africa, focusing on recent and current activity, with some discussion of potential areas. It is complemented by information in the country and language profiles that make up the appendices (Section 12).80

7.1 Evolution of African Language Use in ICT

This part gives a brief overview of aspects of use of African languages in computing and on the internet. A very general characterisation would be that African language computing and internet has been relatively slow in developing, due to a number of linguistic, educational, policy, and technical factors, some very basic as mentioned above (section 5.2).

However, it is important to keep in mind that computers and the internet, like formal educational systems a century earlier, have been introduced and disseminated as more or less monolingual media relying on one or another ELWC. This is a reflection of both the international dominance in software and internet content of these same languages inherited from colonisation, and the knowledge of these tongues by those people in Africa most likely to have access to the technology (generally elites in urban areas).

African Language Text in ICT

Some specific aspects of the evolution of use of African languages in ICT are dealt with below, but one particular problem for a number of languages that are written with modified letters or diacritic characters – or entire alphabets – beyond the basic Latin alphabet (the 26 letters used in English), or the ASCII character set (that alphabet plus basic symbols) is how computer systems and software handle these (see above, 6.2).

Although the earliest personal computer interfaces used the English language and the ASCII character set, the potential to use a rendition of other languages was certainly tried. Such use is hard to quantify but with advances in the capacities of systems to handle larger character sets and the elaboration of the internet, multilingual computing in Africa as elsewhere. The greater but still limited potential of 8-bit fonts (various terms such as ANSI, as previously mentioned, describe this) permitted development of fonts for more languages.

Over the years a number of workarounds have been observed in common use for dealing with African language text in situations where available fonts or font compatibility issues prevented use of the official orthography – notably in e-mail and on the web. A summary of approached to using African text in this environment, as adapted from Osborn (2001), as shown in Table 3:

80 One of the hopes of this study is that continuing to gather progressively more specific information on the country level will facilitate more detailed cross-comparisons of technical possibilities and linguistic needs.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 56

Table 3: Approaches to use of Latin-based orthographies with extended characters and/or diacritics in ICT (adapted from Osborn 2001)

• Standard orthography o Current – the "correct" orthography according to established practice, legislation or decree

o Outdated – a standard that has since been changed (for instance, some usage of Pular of Guinea follow the orthography used until the mid-1980s)

• Substitution solutions o Plain ASCII – any extended characters or diacritics are substituted with the closest

approximation in ASCII (for example, webpages of the BBC in Hausa are in ASCII, not the standard "Boko" orthography)

o Other diacritics or combinations – use of capital letters to indicate extended characters; use of other diacritics (an example of the latter is use of dieresis instead of subdot in some languages of Nigeria)

• Use of image files o Small images for individual extended or diacritic characters – usually GIF or JPG images

that are inserted in the word text at the appropriate position

o Large images of entire texts – usually PDF files, but also sometimes GIF or JPG

• Hybrid approaches o Combinations of the above

o Introduction of elements of orthography from another language – (for example in spelling based on French or English usage that is not part of the orthography)

These workarounds are still being used to one degree or another even as Unicode and UTF-8 – and their accommodation in newer applications – in principle permit use of larger (and complete) character sets. The substitute approaches, for instance, are especially noted on e-mail lists and discussion fora.81 What this seems to represent is that despite the advantages offered by Unicode, the potential is not yet a reality for a wide range of users and internet applications. Another factor

Web content in principle has the same issues but static presentations are able to make use of Unicode, even if it requires input of hex or decimal codes for non-ANSI characters in the HTML coding. The amount of web content in African languages is discussed below.

It is also worth remembering that the discussions of African language use in ICT repeat some of the same themes as discussions before the advent of computers about what kinds of

81 Several Yahoogroups with significant Hausa content are one example, and a Senegalese forum in which there is Pulaar and Wolof content is another.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 57

orthographies, harmonisation of transcription, etc. These were the subject of study and expert meetings mentioned above (4.3). This context is often forgotten but really set the foundation for current efforts in many cases.

Also, and related to the previous point, the use of African languages on computers was preceded by discussions and propositions concerning their use with typewriters and in typesetting. Most of these are forgotten now but they encountered some of the same issues that are of concern now related to input.

E-mail

E-mail was an obvious first step in use of the internet in Africa and for long the principle use.82

By its nature it is harder to track the contents but there is other information that can be utilised to get an indication of the use of African languages for this purpose. For instance, at one point there were two web-based e-mail services that provide for composition in several African languages: Africast.com and Mailafrica.net (though these both have since ceased to function).83 In addition, recent years have seen the setting up of a number of e-mail fora in which much or most of the traffic is in one or another African language. For instance there are several Hausa and Swahili email lists in which these, probably the most widely spoken indigenous tongues on the continent are the primary languages of communication, and Van der Veken and de Schryver (2003) found fora in Hausa, Somali, and Lingala.

Web content

African languages are represented on the web, but not prominently as media of communication. However the actual level of use is emerging as a topic of discussion. It is easy to get the impression that African language content is still rare and only gradually increasing. A look at the results of several surveys yields a fuller picture of the current status and evolution of African language web content.

The surveys can be grouped under four (4) headings:

1. Informal surveys 2. Census 3. Statistical estimated 4. Crawlers

A few years ago some simple informal surveys of web content by language that relied on search engines unsurprisingly did not find enough in any African language to rank them as high as some minority European languages with relatively few speakers.84

82 There was even a "web-page by e-mail" service hosted for several years by Kabissa.org, in recognition of the fact that many people in Africa could not access the web but did have limited e-mail access.

83 It would be interesting to know more about the experience of these services. Unfortunately inquiries have yielded no replies.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 58

More focused surveys yield more interesting results. For instance, an informal survey done in Tanzania in 2001 as part of a larger report for the Swedish International Development Agency estimated that ten percent of websites with a Tanzanian focus had at least some Swahili content (Miller Esselaar Associates, 2001), but most of the sites did not have majority content in the language.

An extensive study by Diki-Kidiri and Edema (2003) involved searching, listing and counting websites. It found a significant number of sites that treat African languages in one way or another, but also showed that these generally have minimal content in the languages themselves. In effect, a large proportion of the sites they censused consisted of presentations about African languages, including online dictionaries and instructional pages.

Another approach taken by Van der Veken and de Schryver (2003), used a different search methodology and statistical extrapolation. By counting hits of particular words and estimating the larger number of words that that might indicate based on frequency of the searched terms in a typical text, they concluded that there may actually be significantly more African language web content than was commonly thought. However it is hard to determine from such estimates what kind of content they would imply.85

A current study undertaken by the Language Observatory seeks to more accurately evaluate diverse language content on the web. The project uses a web-crawler with certain analytical capabilities (Suzuki et al 2002).

Analyzing the character of the content in particular languages is of course more complex than estimating the presence of the languages on the web. Diki-Kidiri and Edema’s (ibid.) study seems to be the most revealing in this regard.

Another study, which did not look specifically at Africa or language, done by Ballantyne (2002) offered a schema for categorising content in terms of its origins and audience. To this one might add a third dimension of subject of the content. Such a schema would be very useful in understanding the nature of the content by looking at where it was coming from and who the intended or anticipated audience would be. For instance the large percentage of sites with African language content being descriptive reflects the dominance of non-Africans on the web, who may be interested in learning or knowing more about the languages. By looking at content in this way, it also facilitates understanding of who is localising content for whom and how their work can best be facilitated. Furthermore, by looking at what is not done in terms of localisation, one can use this schema to analyse the reasons for that and what might be needed to achieve better results.

84 A simple survey of websites by language done in 2000 by Vilaweb, the website of a Barcelona newspaper (Pastore 2000), listed no African languages among the 31. A follow-up to the Vilaweb survey which ranked the top 48 languages on the web found Afrikaans 42nd after languages such as Basque and Slovenian, and Swahili last following, among others, Frisian and Faeroese (Mas 2003).

85 A more recent survey by UNESCO (2005) on linguistic diversity on the internet recapitulates the information summarised in this section for Africa.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 59

Web Content about African Languages

Web content about African languages deserves special comment since it is fairly prominent. This is a broad category that includes a range of presentations varying in quality, from the very informal and sometimes incomplete, to the very well thought out and sometimes ambitious projects. In general this category reflects the fact that the potential audience for African language topics has been predominately people with either little or no knowledge of the language on one hand, or knowledge of the language without a great deal of familiarity with its written form on the other hand (the latter including the people mentioned above [4.4] who are "not literate" in their first language [L1]).

The higher end of this category, if one might put it that way, includes online dictionaries, of which the Kamusi Online Living Swahili Dictionary86 deserves special note, online descriptions of academic use such as the Hausa site at UCLA, and some efforts to use the web as an instructional tool. The latter in turn can be considered by its audience: second (or additional) language learners generally outside of Africa and with little knowledge of the language (for instance at a university), children of expatriate Africans (this is sometimes called "heritage language" education), and L1 literacy for Africans within Africa, including L1 literacy for the otherwise literate or the illiterate (such as the ALI project in Cameroon several years ago).87

In addition to meeting certain needs and raising the profile of African languages in general, such content approaches in principle also enhance the environment for other kinds of localisation.

New Dimensions in Web Content

Two more recent features on the internet offer new potential for expanding African language content: weblogs and Wikipedia. These are described below and discussed more later (section 9).

Weblogs, or blogs, are becoming increasingly widespread around the world including in Africa. There are already several blogs in African languages. Blogging is a relatively easy way for individuals to produce text content in any language, given that there are free sites offering space to anyone who wishes to start a blog. As long as blogs remain a significant feature in cyberspace we should expect to see more – and facilitate more – content in diverse African languages.

Wikipedia is an online encyclopaedia that is expressedly multilingual in its approach. There are almost 40 Wikipedia editions begun in African languages, of which Arabic, Afrikaans and Swahili are the most represented. Many however have very little content. Following discussions on how to facilitate growth of these and other African language editions of Wikipedia at the conference in 2006, an effort to coordinate work was launched.88 86 See http://www.yale.edu/swahili/ .

87 ALI stands for "Apprentissage des Langues africaines par l'Internet" (learning African languages on the internet). See http://www.kabissa.org/archives/a12n-forum/msg00187.html . This is not to be confused with another online program for second language learners of Akan called "ALI Akan" based in Switzerland (ALI in that case standing for African languages on the internet).

88 This effort uses a Yahoogroup at http://tech.groups.yahoo.com/group/afrophonewikis/ .

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 60

7.2 Fonts

Fonts, as an aspect of localisation in Africa, have been a particular issue in countries and for languages that use extended Latin orthographies and/or non-Latin scripts. The following briefly considers these two areas.

Extended Latin

Adapting 8-bit fonts to the transcription needs of many African languages – a practice involving various individuals, organizations, and projects that has been characterised as "anarchic" (Cissé et al 2004) – has apparently been fairly common (see above, 6.2). The result has been a number of mutually non-intercompatible "special fonts," or what are now generally referred to as "legacy fonts," that are still in use to varying degrees.89

There is to our knowledge no comprehensive listing of such fonts but a list of a few examples is given in Table 4.

Table 4: Some Legacy 8-bit Fonts for Extended Latin Scripts in Africa

Location, site or organization 8-bit legacy fonts Created by

Mali Bambara Arial, Bambara Times Created in connection with an ACCT workshop; late 1990s

Matchfont.com (font for Gikuyu) Created by Gatua wa Mbugwa, 1999

Niger INDRAP98, La Nigeriènne Created in Niger (?), late 1990s

SILMany fonts for general and country-

specific usage Created in the 1990s (and before?)

While Unicode as a standard provides for extended Latin characters, the availability of fonts with the characters has only gradually become better. There are additional issues with the support of combining diacritics (such as tone marks in some cases) that have to do with other aspects of software, but nevertheless affect the utility of some fonts. An alternative strategy of using a single glyph for the combination of a base character plus a combining diacritic is one way to get around this problem.

Non-Latin

There are a significant number of Arabic fonts available, either in the 8-bit ISO-8859-6 or Windows-1256 standards or in Unicode. Unicode covers these, of course, plus some additional 89 There are still occasionally new ones created, even though Unicode makes them unnecessary. For example in 2006 a new 8-bit font was announced for the Ewe language in Togo (Togocity.com 2006).

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 61

characters, mostly for non-Arabic languages of the Middle East but also useful for some African languages. It is not clear how well existing fonts accommodate African usage, in part because standards are informal.

There exist various kinds of non-Unicode font solutions for Ethiopic/Ge'ez, and naturally Unicode fonts are more satisfactory for several reasons. There are some fonts for Tifinagh and N'Ko, but the latter still has some technical issues that are not yet resolved.

7.3 Keyboards for Africa

Computer keyboards designed for Europe and North America (in particular the English QWERTY and French AZERTY) are the rule in sub-Saharan Africa. The only language covered in this survey that has well-established keyboards is Arabic. There are input systems for Ethiopic/Ge’ez in the Horn of Africa, though no standard as of yet. Most of the discussion in this section, therefore, deals with efforts to provide for Latin-based transcriptions, and mainly ones that include extended characters and diacritics (see above, 4.3 and 6.3 for background).

Where languages use essentially the same characters that are indicated on the ELWC keyboards (and in the software), there is generally not a question of new methods for input. However, the numerous languages of Africa that use extended characters and diacritics pose varying challenges. In the case of many languages where typewriters for them were in use at the time desktop computers were introduced, the typewriter keyboard was adapted to the computer keyboard. These were few in Africa and have to our knowledge had no impact on computer keyboard design.90 So alternative workarounds were necessary.

Mostly Africa uses keyboards designed for English or French, with a fair number of layouts having been designed for input of specific African languages

As indicated above (6.3) interfaces for input of special characters and diacritics in the Latin script can be done in a number of ways such as using programs like Tavultesoft’s Keyman program, Microsoft’s Keyboard Layout Creator (MSKLC) utility, or simply by assigning keys within a wordprocessor program. These are not particularly hard to implement and in fact there is an increasing number of these available for various languages and countries or regions.91 Some examples of efforts to design keyboards for African language needs are listed as part of Appendix 5 (Section 12.5).

90 Williamson (1984:66) mentions some typewriter keyboards for Nigerian languages along with strategies for typing with English keyboards. In the 1980s, the IBM company developed some typeballs with what we now call extended Latin characters for its Selectric typewriter. Mann and Dalby (1987) proposed a lower-case only keyboard for typewriters and computers based on the Niamey African Reference Alphabet, but this never caught on; see http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=IntlNiameyKybd (there is actually one keyboard layout that is based on the Mann-Dalby Niamey keyboard, but it includes upper case characters as well).

91 See for instance the Tavultesoft site http://www.tavultesoft.com or the keyboard projects links at http://www.bisharat.net/A12N/Projects .

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 62

The issue of keyboard layouts for Latin-based scripts is one that has had the attention of a number of individuals and a few firms. In a few cases, such as in Nigeria, it has also received some official (governmental) attention, but as a general rule keyboards for African languages have not had a wide or systematic consideration. As a result there have been a fair number of layouts designed for one or another situation in Africa, going back some years (and in some cases a keyboard driver and 8-bit font have been developed as part of a package).92

The facility with which one can create and disseminate a keyboard layout has a down side, however. Chantal Enguehard of the University of Nantes and the RIFAL project, has expressed concern that the proliferation of layouts might become confusing. She and Naroua Harouna of the University of Niamey are at this writing researching various keyboard layouts for evaluation and comparison.

Discussion of keyboard layouts in Africa inevitably leads to the topic of use of alternatives to the QWERTY or AZERTY keyboards or the development of specifically African keyboards. In one case for instance, a Nigerian linguist, Chinedu Uchechukwu, who was based in Germany suggested working with the German QWERTZ keyboard which has one more key than the QWERTY. This facilitated including on it the extra diacritical characters necessary to compose in Igbo. This idea and some others that led to creation of several keyboard layouts93 were the outcome of discussions on several email fora (see below, 7.7, Table 6).

The only production computer keyboard we are aware of is the Konyin keyboard for Nigerian languages mentioned above (6.3).94 It follows the thinking that new layouts should probably not depart too much from the keyboards that current users are already accustomed to – generally English or French keyboards.

Currently the entire continent, especially south of the Sahara, uses keyboards designed originally for one or another West European or North American environment.95 To certain extent the use of English, French, and Portuguese keyboards are useful for the official languages are English and French and indeed these may be the basis for more Africanised keyboards. The proliferation of new keyboard layouts for African languages may have some drawbacks, however out of that process we may find new concepts for production keyboards that work better for Africa than the traditional European ones (such as Konyin attempts to do).

Nigeria in particular has seen a number of different efforts to design keyboards to accommodate special character needs for transcribing the many Nigerian languages. Between those efforts and

92 This was the case for instance in Mali where the 8-bit fonts Bambara Arial and Bambara Times were developed by a project facilitated by the French agency ACCT during the late 1990s.

93 These include several by Andrew Cunningham of the OpenRoad project at http://www.openroad.net.au/languages/files/ (these are also listed in Appendix 5).

94 See http://www.konyin.com/ . It is designed for use with Microsoft Windows software.

95 In one large cybercafé in Bamako in 2000, for instance, the author encountered French, English, and German language keyboards.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 63

other such as in the case of Francophone African countries using the AZERTY keyboard, one might foresee at least two production keyboards for Africa, each of which could accommodate more than one keyboard layout. (See also below, section 9).

Alternative input: Graphics tablet keyboards

Some graphics tablet keyboards put together (for production or concept) by Lee Pearce of Large-Format Computing in 2003.96 This solution would seem especially useful for a syllabary like Ethiopic/Ge'ez (and indeed there was a graphics tablet keyboard developed for it), but it has apparently not proved popular in other contexts. The input method is rather slow and requires use of a stylus. However, an advantage is that, as a USB device, it can be used alongside any other traditional keyboard to facilitate multilingual or multiscript input.

7.4 Locale Data for African Languages

Locale data and its importance for localisation and multilingual ICT were introduced above (6.4). At the present time, however, relatively few African languages have local data.

In early 2006, Alberto Escudero-Pascual and Louise Berthilson of IT46 launched an online locale generator tool to assist people in compiling locale data for OpenOffice and CLDR.97 This led to filing of several more locales. Table 5 lists the African languages for which locale data has been filed with CLDR by mid-2006. Locale data is filed for a language and a country

Table 5: African languages filed in CLDR 1.4 (as of July 2006)

Language ISO-639 code used Country(ies) filed for

-1 -2 -3

Afar aa Djibouti, Eritrea, Ethiopia

Afrikaans af South Africa, Namibia

Akan ak Ghana

Amharic am Ethiopia

Arabic ar Algeria, Egypt, Libya, Morocco, Sudan, Tunisia, and several countries in SW Asia

96 See http://www.bisharat.net/A12N/Projects/#tabl .

97 See http://www.it46.se/localegen/ .

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 64

Atsam cch Nigeria

Blin byn Eritrea

Chewa/Nyanja ny Malawi

Ewe ee Ghana, Togo

Ga gaa Ghana

Ge'ez gez Ethiopia, Eritrea

Hausa ha Nigeria, Niger, Ghana; Latin and Arabic scripts

Igbo ig Nigeria

Jju kaj Nigeria

Kamba kam Kenya

Kinyarwanda rw Rwanda

Koro? kfo Nigeria? [there is an error in this locale]

Lingala ln Congo, Democratic Republic of Congo

Ndebele, South nr South Africa

Oromo om Ethiopia, Kenya

Sidamo sid Ethiopia

Somali so Somalia, Ethiopia, Kenya, Djibouti

Sotho, Northern nso South Africa

Sotho, Southern st South Africa

Swahili sw Kenya, Tanzania

Swazi ss South Africa

Tigre tig Eritrea

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 65

Tigrinya ti Eritrea, Ethiopia

Tsonga ts South Africa

Tswana tn South Africa

Tyap kcg Nigeria

Venda ve South Africa

Wolaytta, Walamo

wal Ethiopia

Xhosa xh South Africa

Yoruba yo Nigeria

Zulu zu South Africa

A website called "Yeha" focuses on locale data for languages of East Africa.98

7.5 Software and Operating Systems

Software applications in African languages can be seen as a fundamental way of both facilitating greater use of (and soft access to) to the technology, and a facilitator for those individuals who would develop web content as well.

Software localisation for African languages began before the current interest in localising FOSS, though instances of it were few. Examples of localisations in DOS environment in the 1990s include:

• Somitek Hikadiye – Somali language wordprocessor and spellchecker • Oromosoft – Oromo language in the Qubee Latin orthography • Amharic WordPerfect – version developed locally but not for production • Koma Kuda – wordprocessor for Manding in the N'ko script (a right to left script)

Even where such efforts have not generated sustained activity, they still point to early recognition of the potential and a base of experience to build on.

In recent years there has been more recognition of the need for localised software and efforts to localise. Initiatives such as Translate.org.za have led the way among FOSS localisers in Africa, and there has been some interest by the major proprietary software firm, Microsoft.

98 See http://yeha.sourceforge.net/ .

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 66

Table 6 lists African languages for which there are current active or completed projects to localise the OpenOffice software.

Table 6: OpenOffice Localisation Projects 99

Language Code Website Responsible Person

Afrikaans af http://www.translate.org.za Dwayne Bailey

Amharic am . Daniel Yacob

Arabic ar http://ar.openoffice.org/ Ossama Khayat

Kinyarwanda rwhttp://wyoming.e-

tools.com/Kinyarwanda/OO/stats2.html .

Ndebele, South nr . .

Northern Sotho/Sepedi ns http://www.translate.org.za Dwayne Bailey

Sotho st http://www.translate.org.za Dwayne Bailey

Swahili sw http://www.kilinux.orgAlberto Escudero-

Pascual

Swati ss . .

Tigrinya ti http://tinux.sourceforge.net Mahfuz Ibrahim

Tsonga ts . .

Tswana tn http://www.translate.org.za Dwayne Bailey

Venda ve . .

Xhosa xh http://www.translate.org.za Dwayne Bailey

Zulu zu http://www.translate.org.za Dwayne Bailey

In some cases there has been other software localised. For example, the non-governmental organisation, Open Knowledge Network, developed its own localised software for project purposes. Another example is a children’s computer drawing program, TuxPaint has been

99 From http://l10n.openoffice.org/languages.html .

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 67

localised into Swahili, and recently into Xhosa and Venda. Yet another is the Mozilla browser in Luganda.

In terms of operating systems, there are some projects for localising Ubuntu Linux, and Microsoft is, via its LIP project for overlay packs with about 80% of commands localised, working on several major languages.

Web interfaces

Another interesting area to consider is localisation of user interfaces for online tools such as search engines. This area is related to both software localisation, in that terminologies and user profiles need to be considered, and to web content, in that it involves use of African languages on websites (that happen also to be user interfaces). An example is the Google program "Google in your language" includes several African language versions.100

7.6 Other applications

This part will highlight how some other ICTs relate to localisation and will consider technologies and applications of potential importance in localisation. As discussed above (6.6), internationalisation and research on new technologies are opening new dimensions of ICT. Some of these are already being used or explored in Africa.

Mobile technology

Mobile technology in the form of cellular phones has already emerged as a significant ICT in Africa. Cellular phones are increasingly widespread, much more than fixed line phones now, and even into rural areas of some countries. Along with this, and evolution of the technology to handle text messaging etc., there has been increasing interest in localising the user interfaces in African languages. This may be the new growth area for localisation, and certainly its importance is increased to the extent that mobile devices and computers can be used interchangeably to share and process information. Shanglee (2004) describes some of the considerations in localising cellphone technology for South African languages.

Among cellphone companies, Nokia appears to be particularly active in the area of localisation,101 with Sony-Eriksson and Samsung also marketing local language interfaces in South Africa. An American company, Tegic Communications, has adapted its "predictive text" software – which facilitates input of words using telephone keys and is used on many models of mobile telephone – to several African languages, including Afrikaans, Arabic, and Swahili, with Xhosa and Zulu in development (Senne 2006).

100 A short list is given at http://lists.kabissa.org/lists/archives/public/a12n-forum/msg00478.html .

101 Nokia has localised "menu text and predictive input" for at least one phone model in Afrikaans, Arabic, and Swahili, and "menu text only" in Zulu, Xhosa, Sesotho, Yoruba, Hausa, and Igbo. See http://www.europe.nokia.com/A4160009 .

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 68

Commands in non-Latin scripts of Africa, notably Arabic, with research well advanced on Amharic, are proven, and text messaging in Arabic is also incorporated on many phones used in Arabophone regions.

Audio Dimensions: Text-to-Speech (TTS), and Speech Recognition

There has been interest in TTS by several researchers, since many Africans are not literate. In recent years a few programs have been developed for African languages The Local Language Speech Technology Initiative (LLSTI) has coordinated development of TTS in Swahili, Zulu and Ibibio with local and international partners in each case. An interesting example of application of TTS is the Swahili version, which is used for text messages on mobile phones in a Kenyan project originally pioneered by the Open Knowledge Network and the University of Nairobi.

We are not aware of any STT for African languages, though the Nigerian organisation African Language Technology Initiative (ALT-I) has been doing some research on speech recognition of Yoruba.

Geographic Information Systems (GIS)

GIS is seeing increased use in Africa, but to our knowledge no GIS software has been localised in any African languages.

In terms of potential localisation, the designers of the GRASS GIS application are hoping for it to be translated into diverse languages.102 The software is now Unicode aware.

Machine Translation (MT) and Translation Memory (TM)

The ability to transform thought in writing or speech from one language to another with the assistance of a computer is one of the most interesting uses of ICT in multilingual contexts, but one that has had relatively little attention in Africa.103 The technology in this area is evolving quickly and has connections with and implications for localisation work.

For convenience one might divide it under two headings: machine translation (MT), or the automatic translation between languages by a computer program, which aims at translating

102 Markus Neteler (personal communication, 2005). For more information on GRASS, see http://grass.itc.it/ .

103 The International Association for Machine Translation (IAMT), for instance, is composed of three regional associations, one each for the Americas, Europe, and the Asia-Pacific region, but none in Africa, a continent that by itself accounts for about a third of the world's languages.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 69

speech or text from one language into another, in general or specific settings; and translation memory (TM), which is mostly used as a tool to facilitate new translations based on previous translations of the same or similar text content.

MT has been under development for a number of years and encompasses different approaches and technologies, the details of which will not be discussed here. What is of particular interest however is the subcategory of MT referred to as "shallow-transfer" which is a simpler approach adapted for translation between closely related language pairs (an example being the "Apertium" open-source translation software104). Like the simpler "computer assisted dialect adjustment" (CADA) programs of a number of years ago105 this may find significant use among related languages within Africa.

At this time there is not much MT for use between African and non-African languages apart from Arabic (especially Arabic to and from English), for which there is much research, commercial MT software and even online translation available. For the rest of the languages of the continent, there are at this writing several projects but actual working MT available only for Xhosa and Pulaar/Fulfulde in pairs with English. The latter have been built and presented online106 through the efforts of Martha O'Kennon, professor emerita at Albion College (U.S.), using Prologue, but translate only short sentences. Prof. O’Kennon is also collaborating with other individuals on languages such as Akan and Yoruba.

A number of larger-scale projects exist, notably a longstanding one for Swahili called "Salama" under the direction of Arvi Hurskainen, professor at University of Helsinki (Finland).107 The African Language Research Project at the University of Maryland – Eastern Shore (U.S.) has an initiative to research MT for African languages. There are other corpus-building efforts that envision applying their work eventually in MT, such as one called SAY for Amharic (and some non-African languages) at New Mexico State University (U.S.).

There are apparently some MT specialists in South Africa, but we are not aware of any active MT efforts for African languages based in Africa.

Translation memory (TM) has had some application with African languages. The South African company Web-Lingo, for instance, uses a TM program called "Trados" in some of its work. The open-source TM program "Omega-T" has an Afrikaans version.108

104 See http://apertium.sourceforge.net/ .

105 Jeff Allen, personal communication, 2006.

106 These are accessible via http://mokennon2.albion.edu/language.htm .

107 See http://www.njas.helsinki.fi/salama/ .

108 See http://www.omegat.org/omegat/omegat.html .

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 70

7.7 Facilitation of Discussion of Localisation

Beginning in January 2002, a number of email lists and fora have been set up specifically to advance discussion of topics related to computing and the internet in African languages. Previously African languages and ICT was a subject that might be discussed on other lists or not at all. These dedicated lists bear mentioning as arguably a number of dynamics have been set in motion by their existence and functioning. (Table 7 summarizes information on these lists.)

Table 7: E-mail forums on African languages and ICT

Bisharat E-mail forum Date established

Format & participation Language Number of

subscribers

Number of postings &

notes on topics

Hausa charsets & keyboards http://www.quicktopic.com/8/H/JxKHyg9ccPUVB

2001-2-9Message board, no subscription

necessaryEnglish 9*

94, fonts, orthography,

keyboards

Unicode-Afrique http://fr.groups.yahoo.com/group/Unicode-Afrique/

2002-1-20 E-mail subscription list

French 154

1029, orthographies

, fonts, Unicode, encoding, keyboards,

other projects

A12n-Collaboration http://lists.kabissa.org/mailman/listinfo/a12n-collaboration

2002-3-21E-mail

subscription list English 64

898, character

sets, fonts, keyboards, encoding, technical

issues

Ghanaian languages & ICT http://www.quicktopic.com/16/H/9xffAXi7whnv

2002-7-9Message board, no subscription

necessaryEnglish 4*

57, fonts, orthography,

keyboards

Yoruba language & ICT (fonts, keyboards & applications) http://www.quicktopic.com/15/H/KKgbRqJUAR8

2002-7-27Message board, no subscription

necessaryEnglish 18*

267, fonts, orthography,

keyboards

Igbo language & ICT (fonts, 2002-10-17 Message board, English 9* 190, fonts,

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 71

keyboards & applications) http://www.quicktopic.com/17/H/tCcDxVXHgQxN

no subscription necessary

orthography, keyboards

A12n-Forum http://lists.kabissa.org/mailman/listinfo/a12n-forum

2003-6-1E-mail

subscription list English 42469, news,

localisation, web content

A12n-Entraide http://lists.kabissa.org/mailman/listinfo/a12n-entraide

2003-6-1 E-mail subscription list

French 18 123,

Langues Togolaises et les NTIC http://www.quicktopic.com/25/H/k2zuDzmgxGkc

2004-1-23Message board, no subscription

necessaryFrench 5*

28, fonts, language

instruction

Langues Sénégalaises et les NTIC http://www.quicktopic.com/25/H/6KmBx6F8jES

2004-3-23Message board, no subscription

necessaryFrench 3*

17, keyboards,

orthographies, news

Langues Béninoises et les NTIC http://www.quicktopic.com/27/H/UbEFBKa7X46Ra

2004-7-19Message board, no subscription

necessaryFrench 2*

12, orthography,

fonts, text online

Langues Burkinabè et les NTIC http://www.quicktopic.com/31/H/rhTwJR2T8ar

2005-5-28Message board, no subscription

necessaryFrench 1* 3, sample text

PanAfrLoc http://lists.kabissa.org/mailman/listinfo/PanAfrLoc

2005-6-15E-mail

subscription listEnglish &

French 46 173,

* Subscription is optional; anyone can read or post.

Selected other E-mail fora on African language localisation

Date established

Format & participation Language Number of

subscribers

Number of postings &

notes on topics

[email protected] 2003E-mail

subscription list

English ??; encoding of African alphabets

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 72

Informatique et langues des deux Congo http://groups.google.com/group/info-langues-congo

2005E-mail

subscription list

French 18

?; issues relating to

languages of DRC and RC

Linux2Igbo 2003E-mail

subscription list

English -; localisation of Linux OS

in Igbo

One thing that has become apparent from Bisharat's experience with mailing lists is the usefulness and importance of this medium for communication and fostering collaboration on various aspects of using African languages and ICT. This importance had been underscored by the finding that there have been some cases been different groups working in the same country without knowledge of each other on questions of localisation. E-mail lists are not the only way to foster communication but they are inexpensive and effective and when coupled with traditional conference based approaches, and individual networking, can really change the environment at vacation around a particular question. For this reason the Pan African Localisation project has launched a trilingual forum to attempt to encourage communication across the continent and across the postcolonial linguistic boundaries as well.109

109 There are three e-mail lists, one each in the working languages of English, French, and Portuguese, and a machine translation mechanism to facilitate following all discussions in each language. See http://lists.panafril10n.org/mailman/listinfo/pal-en, http://lists.panafril10n.org/mailman/listinfo/pal-fr, and http://lists.panafril10n.org/mailman/listinfo/pal-pt .

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 73

8. Needs for Sustainable LocalisationAs localisation of ICT in African languages becomes more important, building on the foundations discussed above, it is also essential to consider what is required to facilitate its ongoing development and the accomplishment of its ends. In effect, localisation, which is introduced in the beginning of this document as an end that meets and anticipates current and future needs of Africa for linguistically diverse and contextualised use of ICT, is also a process. And that process in turn has needs in order to meet that purpose and achieve sustainable results.

This document has sought in part to look at that process as well as the results of it. This section reviews the needs of localisation and localisers.

Two dimensions of the question emerge: the types of localisation, and who is doing the localisation. These are considered below in general and with specific reference to localisers, and to the assessment of needs done by participants in a workshop sponsored by the PanAfrican Localisation project in Casablanca, Morocco on 13-15 June 2005. This is followed by a strategic perspective on needs for sustainable localisation in Africa.

8.1 Needs by Kinds of Localisation and Localiser

As discussed above, (2.2), localisation can refer to several concerns. The main ones identified were: equipping of systems, content and user interfaces (software). The localisers in turn were identified (2.4) as falling into three main groups – Africans in Africa, Africans abroad, and foreigners – of which our main concern is in Africa. This part gives an overview.

Needs by kind of localisation

The technical prerequisites for equipping systems and for other kinds of localisation in extended Latin and non Latin scripts are in large measure being met through internationalisation of ICT (section 6, above). The main need here, whether recognised or not by computer technicians, linguists and localisers in Africa, is greater awareness of available resources and in some cases training in use of Unicode.

Likewise, creation and translation of localised content is more demanding, primarily of language skills and awareness of tools appropriate to the work (such as appropriate fonts, how to code in extended Latin and non-Latin scripts, etc.). However, once one begins to talk about text, there are also issues of standardisation of the written form of the language itself (orthographies, spelling, terminologies, etc.). The latter are more properly a concern of applied linguistics, but an example of the dynamics of localisation ecology: unresolved issues over the written form may be a problem for certain kinds of localisation from its achievement to its reception.

Software localisation gets even more involved, especially in the technical skills required. With regard to language, specialised terminologies for ICT concepts in languages new to the

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 74

technology is an issue. Efforts to design tools to facilitate translation of software by people with language skills but no technical expertise is a new area.

This begins to get into the area of the varying needs of people involved in localisation in order to most optimally participate. Before considering that in more detail, it is important to consider the sustainability dimension, which implies a wider range of concerns and even strategic considerations.

Needs for sustainability

Sustainability – a common term in development – refers to the chances for longevity of a system or achievement, and that in turn depends on the design, how it takes into account factors that affect it, etc.

At the most basic level, the tasks of localisation involve an interface of technology and language. These have sense only in their mutual consideration of socio-cultural factors (the user profiles, the ways language is used, cultural appropriateness, etc.). This, as discussed above (section 3), is the core set of dynamics in localisation, but these dynamics are part of a larger environment. Therefore, additional needs of localisers are to understand this context and to have wider support in the environment.

Needs of localisers

One could say that localisation is people to the extent that the process requires their motivation, skills, and organisation. The people who work on localisation of course come from diverse backgrounds with diverse strengths and interests. In general their needs are defined in the context of the localisation effort they are part of, which is to say that the specifics depend on the work being done.

Nevertheless, one might say that in an effort that by definition requires some mix of language, technical, and organizational skills, localisers potential needs in terms of information, support, and training would be in one or more of those areas.

Localisers in Africa, as a category of people working on localisation, is sure to be the fastest growing of the three mentioned above, and any attempt to evaluate localisers’ needs in more detail will naturally need to focus on them. In general, one can safely say that localisers in this context will tend to be stronger on the language side than the technical one, which implies some mix of training focused on the technical aspects of localisation and tools to facilitate localisation.

8.2 Understanding the Needs of Localisers

Understanding what localisers in Africa need in order to start and follow through a localisation initiative can be approached from two directions: 1) an overview and analysis from outside; and 2) the localisers themselves. Both have a value, but the localisers from their experience know the material and information needs of the localisation process.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 75

As part of the PanAfrican Localisation workshop in Casablanca one of the goals was to get feedback from localisers about the needs for localisation. Preparation for this involved outlining anticipated topics and programming workshop sessions to explore the topic. Technical and Linguistic, Strategic and Organisational

The initial suggested breakdown for analysing needs for localisation in the PanAfrican Localisation workshop consisted of two broad categories: Technical and Linguistic; and Strategic and Organisational. These categories are explained below.

The thought was that this outline could be expanded upon by localisers themselves. This was the approach taken before and during the Workshop in Casablanca, using the PAL wiki beforehand and then workshop sessions when participants were together.

Technical and Linguistic

The main question under this heading is: What are the technical and linguistic needs for localisation to succeed in Africa? Technical and linguistic factors110 are both fundamental to localisation tasks, and indeed for other treatment of writing and text in ICT. It was felt to be useful to consider these together.

"Linguistic" in this sense is in the more generic meaning ("related to language") than that of research into the workings of language, but the latter also does have a bearing on localisation work. On the linguistic side there are several issues:

• Standardisation of orthographies • Variation of language (dialects, varying degrees of interintelligibility) • Terminology

The technical issues include those encountered in translating software or content. This might involve certain aspects of internationalisation as well as features should provide for this.

Strategic and Organisational

The main question under this heading is: What are the strategic and organisational needs of localisation efforts that need to be addressed for them to succeed? It was felt that this easily overlooked category is essential to successful localisation. Too often parallel efforts needlessly duplicate effort, initiatives begun with high hopes lack the vision to follow through, and the problem of lack of resources becomes a block. Additionally, it is important that these efforts find ways to gain support in building a favourable environment for localisation. Therefore, it is necessary to discuss and plan how localisation projects can more effectively organise their efforts and coordinate with each other.

Some of the issues in this category were felt to be common to FOSS work in general.

110 In the framework of the PLETES model this would refer to two points in the localization dynamic.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 76

Needs as Identified at the PanAfrican Localisation Workshop

In the Casablanca workshop, an effort was made to census participants' experience and opinions about the challenges they faced, the successes they had, and what they felt they needed in order to be able to do more and better. The method used was brainstorming in small groups, reporting to the larger group in such a way that each unique idea was ultimately put on a separate card. Then as a group we physically put the cards on a blank wall, rearranging as we went in order to find what seemed to be natural categories. The advantage of this approach of course was These ultimately turned out to be five in number:

• Government – relations, importance of support • Peer networking – need for and rewards of collaboration in FOSS localisation • Standardisation – important in several categories, such as orthographies • Technical issues – aspects of the localisation work itself • Sociolinguistic issues – language specific issues encountered in localisation

Much of the discussion related to OpenOffice and basic user interfaces, though various other technologies were also mentioned.

Comparison and Summary

In comparing the two lists there are several remarks.

First, it is interesting that one of the strengths and interests in the localisation area is what we discussed in the Casablanca workshop as "peer networking."

It was a bit of a surprise that the localisers in Casablanca did not address any strategic or organisational issues. However this may either be a function of relative ease of localisation projects at this phase of their existence, before such issues would arise, or be a topic more of interest from an outside view. On the other hand, the peer networking function is one that

Some critiques point out the lack of long-term strategies and marketing. How can one provide this kind of perspective for localisation efforts, policy makers, and donor initiatives?

8.3 Analysis of Needs from a PanAfrican Perspective

While localisers' perspectives are centrally important, it is also necessary to consider a range of short- to long-term needs from a strategic viewpoint. In effect, localisers are experts in their own contexts, but not always attuned to larger and longer-term connections in the localisation ecology. Therefore there is also a place for analysis of needs from "above" in the sense of an overview that takes in to account connections and commonalities among local efforts that are not readily apparent at their level, and also longer term trends that even a workshop of localisers might not identify.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 77

Some of these include higher level support, from intergovernmental and donor organisations, better means for communication so that efforts build on each other rather than duplicating effort in ignorance of each other, and large-scale tools such as databases of information and contacts.

The issue of cross-border languages is one of these strategic issues. At one time conferences on this topic were often sponsored by UNESCO. Presently the need is driven by the potential for localising ICT, so there are various interAfrican and international entities that can sponsor this. However, localisers and their initiatives can initiate contact, and in some cases the need may be as simple as a catalyst or introduction in order to favour cross-border work.

It would be helpful from a planning perspective to have a cross indexing of potentials for localisation in particular languages and the evolving local and regional situations with regard to connectivity, physical access to ICT, etc. – in effect "mapping" the localisation ecology in geographic space. Specifically this would begin to permit the evaluation of key areas of need and highest impact of localisation.

This kind of strategic approach to localisation is lacking and by the nature of local efforts is not likely to be done without some outside help. It is with this in mind that the PanAfrican Localisation project was founded and that its internet presence (website and lists) is being developed.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 78

9. Summary and RecommendationsLocalisation in Africa is an emerging concern and process that is, or should ideally be, meeting the increasing deployment around the continent of a rapidly evolving set of ICTs. In the preceding we have considered the need and environment for localisation, linguistic and technical backgrounds, recent and current localisation activities, and the needs of localisers and localisation efforts. This section summarises main themes and discusses some recommendations and suggestions.

The recommendations and suggestions are organized in several parts below (9.2-9.9):

• Strategic Perspectives • Conferences and Workshops • Training and Public Education on Localisation • Information Resources and Networking • Languages, Policy and Planning • Basic Localisation and ICT Policy • Africa and ICT Standards for Localisation • Advanced Applications and Research

9.1 Major Themes

This document has dealt with a number of topics relating to localisation in African languages, including Arabic. Before enumerating recommendations and suggestions relating to those topics, it is worth pointing out some themes that emerge in the overall discussion. There are at least six:

• The importance of African languages and localisation. This is obvious, as the premise of this project and document, however it bears restating. The importance of languages can be argued on many levels. The focus on supporting localisation of ICT presupposes a general disposition on the part of governments and populations to preserve and develop African languages. Although it is recognised per the discussion above (4.2) that there may not be absolute unanimity on such a question in any language community, it is assumed that there is inevitably some interest that merits some response in the area of localisation. (The important roles of government are discussed below; see especially 9.2.)

• Systems and connections. Localisation is done in an environment conditioned by other factors and processes, and the connections and influence among them. Sociolinguistics, various kinds of policy, technology internationalisation and standards, evolution of ICT itself, and so on. The importance of understanding these systems – this environment – was the reason for introducing the concept of localisation ecology above and for discussing at length the linguistic, technological, and internationalisation factors. These and their interrelationships are of practical importance for the success and sustainability of localisation.

• Specificity of information. In this report, including the appendices, there is an effort to move from generalities like "African languages" to specifics like which languages in what places, and to begin to account for particular realities found in each country. It is natural for a wide perspective like this PanAfrican one to look for trends and larger issues, but it is equally necessary to keep sight of very specific circumstances, needs, etc. This concern is also reflected

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 79

in discussions of various factors in the localisation ecology (per the above) and past efforts. Through specificity there is greater clarity on both what the issues and needs are, and what the potentials might be. Overall, this approach informs current activities and the planning and prioritisation for future localisation work.

• Communication. Sharing of information about activities, resources, etc., including peer networking is essential to the success of efforts on local levels that may lack the means and knowledge themselves to accomplish what they hope to do. The electronic forums discussed above (7.7) are of course but one tool. Conferences and workshops are also important. These will all be dealt with further, below.

• Complementary importance of local prerogative and wider vision. Localisation depends on local initiative and knowledge, but these are seldom sufficient to achieve optimal ends. In some cases broader perspectives can facilitate local work, and indeed contribute to planning localisation in ways that benefit the most people and have the best chance of sustainable results. In other cases, broader views can inform local initiatives of realities that they may not be aware of. Both perspectives – local and wider – are necessary and indeed depend on each other (in a way analogous in some ways to the interrelationship of localisation and internationalisation – the first informs the latter, and the latter facilitates the former).

• Vital role of African governments and intergovernmental organisations. Official institutions in Africa, whether on the country, regional, or continental level, occupy central positions in areas that affect localisation such as language, education, and technology policies. They also have the potential to directly support localisation, through practical means and by elaborating their vision of how multilingual computing can develop in their states and across the continent. Localisation in Africa without appropriate governmental and intergovernmental attention, is hobbled – there are certain policy roles that can only be handled by governments and others that need government involvement.

9.2 Strategic Perspectives

As a general approach, it may be useful to develop a phased, long-term strategy of emphasis on various aspects of ICT: which ones should be localised in what languages, how, in what order, when, and with what people and support.

To a certain degree the initiative for localisation comes from local groups or projects, but it can be encouraged or catalysed from outside. Moreover, outside help can cultivate interest, offer guidance, or even make connections. For instance, isolated groups or focused projects working on related languages, or on the same language in different countries might be encouraged to combine efforts and share ideas, but may lack the vision or the connections to work in that direction. These are all related to the goals of the PanAfrican Localisation project.

A creative strategy for localisation of ICT in African languages will also need to take account of unique aspects of the nature and distribution the languages and their current and evolving use in their respective societies. This is particularly the case for sub-Saharan Africa, as the predominately Arabophone north has been able to benefit from work in many countries on developing computing and internet in Arabic.

The interspersion of speakers of diverse languages, the use of different languages in different social and economic contexts, and the general multilingual nature of most African societies present a situation that is very different from the linguistic and sociocultural profiles of the more

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 80

technically advanced countries. It is in some ways more comparable to the situation in multilingual South Asia, except that there the longer written traditions and larger speakership of most languages have made for an easier connection with ICT. Africa will therefore have to create its own terms of reference with regards to language and technology (and all aspects of localisation ecology), and ultimately must develop its own approaches to technologies that respond to African sociolinguistic realities and socioeconomic aspirations.

It is not at all clear that the authorities charged with developing language policy and technical policy are ready to play this role. There is in some cases genuine interest (ACALAN, for instance, has promoted discussion of linguistic diversity on the internet), and in others some basic work (such as discussion of keyboard layouts for Ethiopic/Ge’ez in Ethiopia, and applied linguistic research such as by CASAS and NACALCO), but not apparently the capacity (or will?) to articulate the potential of multilingual ICT in Africa or propose practical steps to accomplish it. If one looks at the record of follow through with regard to language policy proposals over the last several decades, the picture is not encouraging.

Nor, for that matter, are most major donors attuned to these realities or these needs. Development agencies by and large have not given much attention to African languages outside of some limited support in adult literacy and more recently in some bilingual education reforms, and localisation is much the same story.

Two immediate questions emerge from this consideration: how to promote interest in strategies and prospects for localisation, and how to develop capacity to develop and pursue them? From that, in turn, a third question is where the vision and expertise to respond to all of the above will come from?

It is possible that an African organisation will take the lead on this, perhaps in collaboration with a research organisation such as IDRC or an international body such as UNESCO. Cooperation between external agencies with the technical vision and African agencies with the policy mandate might be an ideal combination, but who will begin by formulating a comprehensive vision (as opposed to general declarations)?

One would hope to see African institutions of higher learning and research in the vanguard of proposing policies and devising strategies for envisioning futures and building skills, however this is not yet apparent. A possible hope is that African academics in northern universities along with other Africanist scholars. These academics and the institutions of which there are part, in partnership with African institutions may be able to positively influence the evolution of information technology for all languages in Africa.

Another hope is that local efforts to translate software can, in addition to achieving their ends and by so doing introducing dynamic new elements into the discourse on ICT in Africa (that is, localised FOSS applications), can build momentum, encouraging some broader response. This would require some very good communication (see below, 9.5) as well as organisation. Still another angle that should not be overlooked is the commercial sector. Microsoft in particular has invested in a certain amount of localisation of its software in Africa in collaboration with some government agencies in certain countries. This represents strategising on a different model, but

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 81

one that may have lessons in terms of approaches. Cellphone companies that are providing for African language SMS are another category of company in the field. Last but definitely not least are African companies that are involved one way or another with localisation – even though their number is not large, they may represent a growth area. In the ideal situation one would hope for a general agreement on the importance of localisation among government agencies, donors, non-governmental organisations, and businesses involved in ICT in Africa with the intent of facilitating it in as harmonised a way as possible.

A Fundamental Strategic Question

Before going on to discuss other recommendations and specific strategies, it is worth suggesting taking a moment to step back. On the part of governments it may be instructive to ask a fundamental question: Do we want these languages to continue to be viable for future generations?111 This may seem a very stark question at a time when where language extinction in Africa and worldwide is a common topic, but it serves to highlight the bottom-line issues at stake. An answer in the affirmative leads to the unfolding of other questions and indeed imperatives, some of which connect directly to the subject of localisation of ICT and the need for coherent strategies to favour it.

9.3 Conferences and Workshops

Localisation in Africa involves several factors on one hand and various agencies, organisations and individuals on the other hand. As ICT rapidly evolves, the need for coordinating efforts, planning development, and training people is ever present.

The inevitable workshops and conferences that will be held for such purposes should – aside from being sure to include all "stakeholders" – build on each others’ work and not lose sight of certain important concerns. There is a reason for putting it this way: conferences and workshops are often organised with little or no reference to previous ones, and when there is reference, it is too often not accompanied by any substantive connection or building on previous efforts.

In that context one might propose first of all a strategic plan for localisation meetings. Such could be devised as a framework for reference by diverse organisations, and perhaps could be designed and implemented at a high level with participation of some key organisations and agencies. This would require some broad agreement on goals of localisation and the meetings, but should focus on a cumulative process rather than divining the outcomes of the process.

Four broad concerns related to localisation could benefit from coordinated series of meetings (listed below from the local level up):

111 Helen Ladd, professor of public policy and economics at Duke University, proposed a similar question regarding the South African government: "...part of the broader language policy they need to grapple with is should all eleven [official] languages remain as viable languages?" (Aziz 2004). In other words, she was not talking even about languages in danger of extinction, for which such questions cannot be escaped, but rather the official and widely used languages of the country. This report is proposing similar questions by other African countries.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 82

1. Localisation workshops, or technical workshops for software (and/or content) localisers. In addition to workshops such as the PanAfrican Localisation workshop in Casablanca and the "Africa Source" events organised by the non-governmental organization, Tactical Tech, it would be useful to have regional, country-specific, and even language specific meetings, depending on the particular needs and interest in various parts of the continent. The purpose would be to train specialists in aspects or phases of the localisation process, from basics like Unicode and content creation, to creation of locales, to translation of diverse software applications, to follow-up issues in localisation, and advanced applications (concerning the latter, see below, 9.9).

2. Application and adaptation of localisation for development ("digital divide") and business environments, including content development. Ultimately, one of the main purposes of localisation is to help in bridging the "digital divide," but exactly what that means in particular cases may need to be explored in meetings of localisers and development agencies. An interesting example is the internal process of the Open Knowledge Network, which attempts to use diverse languages and provide for sharing of information among various partner agencies. In the longer run, as more localised software is available and the potential for localising even specialised applications is more real, it is likely that broader interagency consideration of localisation in ICT4D contexts will be helpful and indeed necessary.

3. Language issues and ICT. This category includes language planning "expert meetings" to deal with questions relating to how to handle multidialectal languages, cross-border languages, and language clusters in localisation and ICT generally. Such meetings would be along the lines of the meetings facilitated by UNESCO in the 1960s and 1970s, or a smaller one in Okahandja, Namibia in 1996,112 but with the added context of ICT. There is evident need for more deliberation on aspects of language use in computing and on the internet. In some cases there are basic matters such as orthographies or a nexus of issues concerning the relationship of dialects and closely related languages, and the possible selection of standardised versions for various kinds of work. In others, the representation of African languages in international standards such as ISO-639 and in locales (CLDR) may be an issue. Terminology development and harmonisation may be a further matter for consideration. (See also 9.6 and 9.8, below.)

4. Strategic planning for localisation in Africa. This category relates closely to the discussion above (9.2), dealing with visioning and planning for the broader picture over the long-term, on continental, regional and country levels as appropriate. Sessions concerning ICT at the WSIS African planning meetings in Bamako in 2002 and Accra in 2005 were along these lines. This category could include meetings to plan other conferences and meetings in categories 1-3, above. It might also include less formal meetings among high-level decisionmakers.

In some cases such meetings can be organised as part of larger events or co-located with other meetings where one might hope to optimise participation and minimise extra travel. The topics and purposes of the meetings will vary according to the local, country, regional, and continental needs.

Geographic Scope and Strategy for Meetings

In addition to Africa-wide meetings, a PanAfrican strategy for conferences and workshops in the above areas would need to pay particular attention to facilitating and sponsoring workshops on national and regional levels. In fact, at this time it may be best to prioritise regional meetings, with secondary attention to periodic continent-wide meetings. Such regional meetings would also 112 See http://www.bisharat.net/Documents/ .

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 83

include some with specific working languages other than English – notably French, but also Arabic113 and Portuguese.

Giving localisers and others who are interested in African language computing and internet the chance to address issues particular to their circumstances and the languages they localise in would better support local efforts and also enrich periodic continent-wide meetings. Maintaining communication on a PanAfrican basis can also be facilitated by use of email lists and websites, such as those set up by the PanAfrican Localisation project.

There has been discussion, for instance, about a possible conference on aspects of localisation in Nigeria, which could be a place to begin such efforts. Several different organisations are working on various aspects of localisation, from translating software, to developing keyboards, to text to speech software, to speech recognition, and perhaps other areas as well. These include efforts focusing on the three main languages in Nigeria, namely Hausa, Yoruba and Igbo, with some notable efforts concerning other languages such as Ibibio.

Any such initial regional meeting in one part of Africa could provide experience for similar meetings in other parts of the continent.

Beyond Africa there has been discussion in IDRC of the possibility of a global conference on localisation, including Asia (notably the PAN Localisation Project), Africa, and perhaps indigenous communities of the Americas and Oceania. This level of meeting is productive in permitting wider exchanges and networking that expand local, regional, and continental networks.

9.4 Training and Public Education on Localisation

Education for localisation – that is, training of localisers and public education on localisation – is a set of concerns beyond workshops and the declarations often issued by such meetings.

Training

In any field it is common to hear proposals for more training to build skills to enable people and teams to achieve certain ends. Localisation is no exception. Topics of trainings could vary, but as the case with planning workshops, attention should be given as to how these fit in a larger strategy of skill development and production of localisation.

In terms of intended beneficiaries, training for localisation can focus on people already involved in localisation initiatives, people who would like to be but have little but the motivation to begin with, and people who are in neither group but should know something about localisation (such as people planning ICT4D/E projects or setting up telecentres). In the course of organising the PanAfrican Localisation workshop in Casablanca, for instance, we received some requests inquiries from people interested in receiving training in aspects of localisation.

113 It is our understanding from previous experience that many people working on localization of Arabic often use English or French as working languages. Nevertheless, the possibility of working in Arabic is considered.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 84

Investing in training for the future

Most discussion of localisation and meetings about localisation focuses on immediate projects and measurable results in the short to medium term. This is the case in terms of training for localisation (or related areas such as ICT4D) as much as it is for other goals.

When considering issues such as sustainability of localisation and the potential use of advanced ICTs with African languages (on the latter, see below 9.9), a longer term need for high degrees of professional skills is apparent. Although it goes well beyond the aims of this project, it is worth calling for an investment in educating a generation of African experts in localisation and in language and computer science. These are interdisciplinary areas getting increased attention elsewhere in the world, but not so much in Africa. Furthermore, there are few Africans involved in discussing issues of internationalisation of ICT, for instance, let alone involved in research on machine translation or speech recognition. Unless this is proactively addressed now by institutions and donors with the means to do so – providing scholarships, investing in research programs – Africa will remain handicapped over the long term in areas of multilingual ICT in which it can arguably benefit and contribute the most.

Public education

The "public" in public education on localisation includes several groups: computer users in Africa in general, with particular attention to people working in some technical capacity but not connected formally with localisation, and also others occupying what might be called key decisionmaking positions within the localisation ecology (e.g., policymakers, educators, business people, etc.). Such an approach could help raise awareness about localisation generally, increase knowledge about specific aspects of ICT, and even motivate action.

Public education can be accomplished through conventional public relations and development communications approaches. The internet – websites and mailing lists – can of course be used to advantage (see also 9.5, below). However it may also be useful to develop a kind of public education campaign to increase awareness and attract the attention of media in Africa.

One example would be the organisation of thematic "Years of Localisation in Africa" campaigns to focus attention on specific aspects of localisation and multilingual ICT in Africa. The topics could include, for example, Unicode, locales, digitising texts for dissemination via the web, advanced applications, and so on. Suggested themes for the next two years follow:

• "2007 Year of Unicode in Africa," which would be intended to raise awareness of and discussion about use of Unicode for African languages

• "2008 Year of Locales for African Languages," which would be intended to encourage filing of

It is suggested to begin with Unicode as a topic needing attention that is more a matter of awareness than one requiring conferences and measurable output. Each year has a focus while the next year is being planned.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 85

9.5 Information Resources and Networking

This part focuses on providing structures for information for localisation, the means for learning and discussion about localisation throughout the environment affecting localisation, and ways to facilitate and enhance peer networking among localisers themselves. In a sense these communication strategies might be called meta-strategies in that they are intended to help facilitate the development of, access to, modifications in, and implementation of other strategies discussed in this section.

Various resources are available online for localisers, about FOSS, and even about African languages. However there has been until now there is, with only a few exceptions, a lack of adequate resources focusing on localisation in African languages – in general or with regard to specific languages. The exceptions include some dedicated to Arabic, Ethiopic/Ge’ez, and a few others like the Translate.org (there being only a few exceptions to the latter). This indicates a need for web resources that target the needs of localisers and localisation in Africa.

As far as e-mail lists go, there are several dedicated to one or another aspect of localisation and multilingual ICT in Africa (as discussed above, 7.7). Nevertheless, these are somewhat specialised and are divided into lists that use English or French as the working language.

Beyond such resources, which facilitate finding and exchanging information, are people and their networks. The cultivation and enhancement of localisation networks is a wider and one might say more dynamic need. Networks are served in many other ways as well, including for example meetings as discussed above. However the internet provides a ready way to maintain networks as well as enhancing communications.

Role and plan of the PanAfrican Localisation project

In response to this situation, one of the purposes of the PanAfrican Localisation project is the development of an online resource for African localisers. After consideration of different approaches, an approach including a website based on a wiki along with associated e-mail lists was adopted. The website is intended to be the first place one goes to find information relating to localisation in a particular language or country, and a place with links to various relevant resources and tools. It targets not only localisers, but also others in positions to support localisation or promote multilingual ICT as part of, for instance, "digital divide" projects. The site includes the following elements:

• This document • Profiles (which double as appendices to the document) of

o Major languages o Writing systems (alphabets) o Countries o Inter-African o Localisation resources

• Wikigroup communities for each country (and potentially other groups) • Diverse other sections

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 86

The profiles are intended to offer information in the context of different perspectives on localisation. Languages (the core issue of localisation) cross borders, countries (where policies are made) include diverse languages, scripts (the form of text used in ICT) are used for many languages and modified in various ways, various organisations deal with languages and localisation across the continent, and there are a range of general localisation resources that are useful for African localisation. These are detailed the Appendices (section 12).

Wikigroups are an attempt first of all to involve localisation and multilingual ICT groups on country levels. This is a longterm strategy for hosting flexible-format local web spaces that are independent linked to and searchable from the larger website. Other kinds of specialised wikigroups relating to African languages and localisation will be added later.

The project has initiated three new e-mail lists, one each for three working languages – English, French, and Portuguese – to facilitate communication and networking among localisers across the continent. These are linked to each other by a machine translation program intended to make the system truly PanAfrican (even if the translation results are not perfect). It is hoped also to link these and other resources via RSS to the relevant wikigroups.

Taken as a whole these are intended to fulfil needs for information resources and networking for localisation. These will certainly evolve, and link to new resources addressing particular localisation issues and needs in Africa, and together promote a dynamic evolving space for African language localisation.

9.6 Languages, Policy and Planning

This part deals with language policy and planning as they relate to localisation. Any discussion in this area necessitates consideration of countries – the level at which policies are made and planning done – as well as the languages themselves.

Localisation and African Linguistic Diversity

A strategy to support localisation in Africa must begin with a sense of the scope of the project, and awareness of where that support might go beyond focus on existing initiatives to engage in a proactive approach of outreach to potential localisers where there are yet no initiatives. This is one of the reasons that Appendix I (Section 12.1) lists what can be considered the major African languages and Appendix III (Section 12.3) lists all the countries in Africa. What is named is not as easily overlooked, and by starting with some specifics there is a more tangible place to build from.

With limited means available for localisation, it will be necessary to prioritise efforts and resources, whether on the language, country or broader levels. This may tend to disfavour less widely spoken languages including endangered languages, at least in the short-term. It may be that the best strategy for languages not prioritised for software development, at least in the beginning, is to incorporate spellcheckers for them into software that is localised for more widely spoken languages. That sort of interim solution is already a significant challenge for languages with relatively few speakers and resources, but arguably a realistic goal. Also, since Africa’s

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 87

linguistic diversity lives in multilingual contexts, such combinations with more and less widely spoken languages might actually be natural.

Standardisation of Languages and Orthographies

A related issue both within individual African counties themselves and among neighbouring countries which share the same (cross-border) language(s), is that of standardisation. In some cases, where there are dialect variations, it might be helpful for localisation (and eventual computer users) for standard versions, or at least orthographies and terminologies, to be identified or developed where they are not already chosen. Generally this is something undertaken by governments (such as in Uganda for Runyakitara), but in at least one case (N’ko for Manding languages) efforts to develop a standard are local.

As discussed in a previous section (4.3), standardised orthography is an issue that is not settled for all languages in Africa, even in the case of some languages with more established writing systems. This has been a recurrent issue in Africa since independence, along with the harmonisation of transcriptions for related languages, languages that cross borders, and diverse languages within each country. These issues must receive continued attention and resolution so that the languages can be consistently used in all text-based computer applications and content.

Educational Policy

Recommendations concerning educational policy with respect to languages of instruction and the teaching of African languages in schools are beyond the purview of this report. However it is clear that localisation and the use of African languages in education can benefit each other. Installing localised software on computers that are installed in schools would offer students different ways to learn about and interface with the technology. Pluriliterate students can more effectively interact with and contribute to building a rich array of multilingual African web content and software.

Externally funded ICT4E projects like the One Laptop Per Child project, can also have a positive impact on localisation and language in education programming.

Beyond schools, the potential for use of ICT in basic literacy and in first language literacy of those educated only in ELWCs can be explored.

It is also worth exploring possible connections between all of the above and the development of various online dictionaries and second-language instructional modules. Could African language programs at Northern universities collaborate with applied linguistics and educational programs in Africa to create online language resources for Africans?

9.7 Basic Localisation, and ICT Policies and Programs

Another main support for all aspects of localisation could be through other activities relating to ICT, notably country-level ICT policies but also FOSS user groups and even development projects.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 88

Country ICT Policies

Country-level policies for development of ICT, such as NICIs, could make more specific mention and commitment to localisation. This could include moral or material support for: development of web content in indigenous languages; translation of software; participation in elaboration of standards that affect use of the languages; and training relating to language and computing.

Governments obviously have enormous potential influence by their example – for instance the languages in which their content is posted (South Africa is an interesting example in this regard, though some other governments have indigenous language content). Also, governments could insist that foreign-funded ICT4D projects take into account the multilingual nature of the populations in choice of software, fonts and keyboards for computers they deploy.

Role of International Development Organisations

A number of donor agencies and other organisations that are involved in international development have taken an interest in various ICT4D and ICT4E projects. Although a few such organisations such as IDRC, OneWorld, and GeekCorps have paid attention to aspects of localisation in web content and computer interfaces, this is an area in which such organisations are well-placed to do a lot more. Several are listed below:

• Equipping systems in telecentres (per the discussion in section 2 about types of localisation) • Diverse language content • Designing localised web content to be printable for community reading in locations distant from

telecentres • Seeking collaboration with localisation groups concerning localised software where they exist in

beneficiary countries • Supporting new localisation efforts.

Bringing FOSS Communities More Fully Into Localisation

The oft-mentioned distance between linguists and computer technicians risks being duplicated between some FOSS user groups and localisers in Africa. It is a good time to build bridges. To that end it is worth proposing a program of identification and contact of FOSS groups in various countries about their interest in developing agendae for localisation.

Linking Different Localisation Currents: FOSS, ICT4D, and Commercial

There seem to be two or three different levels in operation relating to localisation that should be linked. One is localisation focused mainly on the language communities within specific countries or groups of countries – this is the level on which the PanAfrican Localisation project and other localisation initiatives has tended to operate. In general these are local non-governmental initiatives, but any government-sponsored localisation

Another is localisation in specific development contexts, such as what the Open Knowledge Network project has done in some parts of Africa as well as south Asia. The two overlap in

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 89

Africa itself but the former focuses on general FOSS applications and also may involve Africans overseas, while the latter involves development agencies designing and implementing ICT4D projects. Both in principle should involve linguists and computer technicians. These together could be linked in a way that promotes cross-fertilisation of ideas and fruitful collaboration.

A third level is that of business and commercial interests. On the one hand there are international proprietary software firms, notably Microsoft, which are interested in localisation in certain markets, and also African software companies. On the other hand, there are firms in other industries that may look at localisation of content or interfaces.

Linking the latter with the previous two might present challenges in terms of reconciling approaches, but also has the potential to benefit localisation in unforeseen ways. For broader impact and longer term sustainability of localisation, seeking ways for coordination of or cooperation among diverse groups with common interest in localisation might be very positive for African languages.

FOSS, Proprietary Software, and Limited Localisation Resources

In ICT and language work as much as in other domains, Africa tends to have limited resources. Another way of linking different currents of localisation might be to promote a space of collaboration among proprietary and open source efforts on certain basic resources for localisation in minority languages – a kind of "historic compromise." The idea would be to favour development of resources like dictionaries in ways that are most likely to benefit the most people as early as possible. Such an agreement would also be intended to favour the least widely spoken languages.

Mobile Technology

The rapid spread of cellphones in Africa and the miniaturisation of computer technology is beginning to open a new dimension of localisation, and as such deserves to be considered in policies and programs for localisation. This technology is discussed further below (9.9).

9.8 Africa and ICT Standards for Localisation

As discussed in previous sections, standards are among the requirements for success in localisation. They facilitate use of diverse languages in ICT, translation of software, and or production and use of linguistically diverse web content. Recommendations concerning language-related standards were mentioned above (9.6). This part focuses on some technology-related standards affecting localisation and multilingual computing in Africa, ranging from keyboards to coding.

A general recommendation here is for greater involvement by governments and interAfrican agencies in all levels of standards making that affect localisation in African languages, including ISO and its relevant technical committees.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 90

International Standards for Language and African Participation

The issue of ISO and African participation – or lack of it – in making standards for languages was brought up above (9.3). There is a need to find ways to facilitate and encourage African governments and standards bodies, as well as language and applied linguistics agencies, to take a more active interest in these issues. It is a key part of the localisation ecology, and until now it is being shaped almost entirely outside of Africa.

One particularly important issue is language coding under ISO-639 (see above, 6.3) – in some cases new codes are needed and another is the new elements of the standard in development (ISO-639-4-6). Without input from Africa and experts familiar with the realities of African languages and localisation, these needs may not be identified. A way should be found to analyse the system of codes as it applies to African languages and language groups in order to assure appropriate coverage for locales and diverse localisation needs.

Another international standard area of concern is Unicode. Although the character ranges already encoded for major scripts used in Africa are quite extensive, there may be additional characters needed. In addition, there are some minority scripts still unencoded. At the present time, however, there is almost no African representation in the standards process for this area, leaving the deliberations and decisions to other countries.

Locales

Locales are categorised by internationally standardised codes (language, country) and are fundamental to localisation. Yet there are still relatively few languages with locales. This is an area that needs not only more attention, but also coordination in the case of some language groups or "macrolanguages" where there are different code options. A campaign to increase the number of African language locales filed should be planned and coordinated if possible, with these concerns in mind.

Keyboard Standards for Africa

As discussed above (7.3), keyboard layouts for African language needs (and in one case a production keyboard for Nigerian languages) are getting increasing attention on local levels. The question arises as to how to favour some standardisation of layouts such as would benefit localisers and computer users. Keyboard layouts of course may be independent of other software, but they are also a consideration in software localisation and even keyboard manufacture.

Current localisation efforts addressing the issue of keyboards should begin to consider the larger and more long-term keyboard evolution issues, at least to the extent of familiarising themselves with other keyboard layouts already in existence in the same regions or countries, or for the same or similar languages. The general evolution of African keyboards might ideally be to accommodate multiple language usage in a particular country, group of countries, or region.

Another area is the interest in keyboard layouts as part of localising production software such as OpenOffice, or simply in facilitating input in African languages in non-localised software. How

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 91

can these be coordinated? Such discussions might provide the basis for arriving at proposed standards.

At a higher policy level some thought needs to be given to who would have the authority to look at this kind of question and how they'll work – whether this be government agencies only, or linguist specialists, or commercial interests, or some combination of these. As part of this, again, it is important to look at what is already being done and used in diverse circumstances.

In developing keyboards for wider and long-term use, it is also essential to keep in mind some basic givens as well as the ISO-9995 guidelines. First is to understand the habits of computer users with regard to keyboard use. It is reasonable to assume that multilingual and pluriliterate people in Africa will use more than one language in their career of computer use, and perhaps during single sessions at a computer. Moreover, it is also important to remember that many computers will be used in public access points such as cybercafés and telecentres, meaning that multiple language preferences might need to be provided for at any single station. While these considerations may not seem so pressing at the moment, when almost all one sees is software and websites in ELWCs, once localised software and interactive content in African languages become more widely available, then the potential for diverse use of the technology is opened up. There is a need to anticipate these needs.

In this context, a dual focus is probably indicated: what is most effective and useful for the end users in particular situations, as just discussed, and an appropriate "massification" for the market, meaning how to, in a single keyboard satisfy the needs of as wide a usership as possible. In the end, successful keyboards for Africa must balance such criteria in appropriate measure.

Finally a forward looking strategy should also consider alternate means of input, from the implication of the emerging LED keyboards, to graphics tablets and also speech-to-text (the latter is considered below, 9.9).

9.9 Advanced Applications, Tools and Research

Localisation, as applied to software, most often refers to computer software for general use. It is important to also think beyond that definition to looking at localisation of specialised and advanced applications, development of software tools that let one do more with language, and research to develop these and advance the technology generally with regard to its use in and for African languages. In other words, some specialised and advanced uses of ICT can be the object of localisation efforts and in some cases can facilitate other localisation efforts.

This part discusses some advanced technologies and suggestions to promote research on them.

Mobile technology

Mobile technology – facilitated by ongoing advances in miniaturisation – was discussed in previous sections (6.7, 7.6) and mentioned above (9.7) as an important new area for localisation and one that, at least as far as cellphones, is already quickly becoming important in Africa. The potential for SMS, e-mail and other text in African languages from mobile devices, as well as the

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 92

possible interfaces with voice (such as the Swahili TTS mentioned above), may require attention to localisation standards. In the case of complex scripts, issues not only of compatibility with what is used on computers, but also standards is raised. What kind of links can and should there be between companies involved in localising cellphones and handhelds and the localisation of computer software? In the case of Microsoft, there may be some synergisms between their different efforts (computer and mobile software), but where does FOSS localisation fit in? With increasing miniaturisation and innovation in handhelds, a key question is whether there is a need to discuss the evolution of mobile devices, a sector dominated by commercial interests, for African contexts? The commercial nature of the industry is not an issue, but the apparent lack of connection between it and other localisation efforts is worth examining. Moreover, the potential for localised software for handheld computers should be researched.

Geographic Information Systems (GIS)

GIS, which permits digital manipulation of data and map images, was discussed in previous sections (6.7, 7.6). It merits further mention as a possible priority technology for local development planning, analysis, and education that can be localised. It is being used as part of participatory development projects in various parts of the world including Africa, and the inevitability of increased use in Africa points to the potential for localisation.

Given that there is at least one good FOSS GIS program, it is worth suggesting that (1) computers in all community and school telecentres and government offices in Africa be equipped with this software and (2) a program of localisation of it into major African languages be launched.

As a first step, the possibility of localising the GRASS software should be explored immediately with Swahili localisers (such as the Kilinux group in Tanzania), the GRASS technicians, and possible funders.

GIS might also be a useful tool in language and localisation planning. Although we start by discussing localisation of GIS for various applications, it might turn out that, given issues of geographic distribution of speakers of various languages, GIS can also change how we discuss and plan for localisation.

Cutting-Edge Language Technologies

In African contexts – characterised as they are by multiple languages, traditions often described as more oral than written (in many cases written forms are not yet standardised), and low literacy and illiteracy – conventional uses of ICT that focus almost exclusively on text will not take advantage of the many talents and ideas that naturally come forth in many languages that do not have a well-established written tradition. In addition, minority languages will be at a continued disadvantage – the spoken language will rarely be written, and the few recordings that exist may never be transcribed in a way accessible by native speakers.

New technologies in principle offer new potentialities for use of any language, and the languages in the most disadvantaged position stand to gain the most. Language and ICT technology policies

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 93

need to take this aspect into consideration. This in turn requires clarity of vision and communication among policymakers, researchers, activists, and native-speaking communities.

Some of the most advanced technologies dealing with transformations of speech and text may, as paradoxical as it may seem, actually be the most appropriate for African languages. These include several discussed in previous sections (6.7, 7.6):

• Speech synthesis and text-to-speech (TTS) • Speech recognition and speech-to-text (STT) • Advanced uses of (digital) audio • Machine translation (MT) and translation memory

In addition to basic research, these areas could benefit from innovative thinking on long-term strategies for how to use and adapt such technologies to African realities.

Planning and Research

One idea would be to promote basic and applied research on the abovementioned areas in a long-term applied research program by a consortium of institutions in and outside of Africa. The philosophy would not be "catch up," but rather "go ahead" in the sense that ICT might find new and innovative uses in African contexts.

The idea would be to target, over the next decade or so, a range of cutting-edge technologies with specific aims. Some suggestions follow (note that for some of them, the basic issue of standardised orthography is important):

• Enhance and diversify the limited number of existing TTS applications. This might involve the Local Language Speech Technology Initiative (LLSTI), among others

• Develop STT programs for a select number of languages, and investigate ways of streamlining production of such software for a variety of languages, and making it reliable across dialect differences of languages without standard dialects.

Research the potential for various African language content and applications that do not rely primarily on text, namely audio-only and audio with images. The object would be not only to enhance soft access by people with poor literacy skills and the disabled, but also to explore new ways of interacting with and using the technology for all users in Africa and beyond. The possibility of involving Africa's oral history centres in such research should be considered.

• Explore uses of translation software (machine translation and translation memory) for African languages, including those less widely spoken. For instance could translation memory banks be developed online for use in more than one location? Could the evolving range of approaches to machine translation be adapted to different language situations on the continent? How can this set of technologies address basic hurdles to translating materials into African languages, for instance for government services, education, and development?

• Involve and train African linguists and computer scientists with the aim of empowering a new generation of African research in natural language processing and localisation (see also above, 9.4)

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 94

• Link efforts for multilingual ICT innovation with basic localisation and ICT4D/E, with the ultimate aim of facilitating a range of work in to serve African needs. This might involve developing new partnerships for both research and practice. One example might be localised GIS for rural development. Another might be distributing localised FOSS to new telecentres, and then involving them in feedback on performance and improvements.

Facilitating Other Kinds of Localisation

Mention was made in previous sections (5.5, 8.1) about software to facilitate the translation of software. In a similar way, some of the abovementioned applications can be used to facilitate localisation and creation of content in African languages. They may also find uses in localising software.

Where production of text in diverse languages is involved, the potential for use of translation software and STT to reduce the cost and time required to develop material for web content or even print (such as school materials) should be examined.

Finally, the use of well-established scanning technologies, with OCR adapted as necessary for extended Latin and non-Latin scripts, could permit digitizing a lot of published material on and in African languages for dissemination (with permission) to native-speaking communities via the web.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 95

10. ConclusionThis document has attempted to survey the situation of and potential for localisation in Africa. As a key conceptual element in this consideration, a model of localisation ecology has been introduced. Background information on African languages and on aspects of ICT has also been covered, to serve as context for the current situations and recommendations that completed the report.

While the general message of this document is that localisation of ICT in African languages is important and that the potential of ICT for various uses in African languages is great, it is also recognized that the path is not simple. Basic technological and educational situations are not favourable, resources for localisation are few, policies are not actively supportive, and so on. It is possible to make recommendations but we are still left with some major questions.

• What are likely to be the next phases of localisation in Africa? • What aspects of localisation to prioritise? • Who on the language, country, regional, working-language, and continental levels, will

lead the efforts, and how to coordinate among them? • How can initiatives for localisation best be encouraged, coordinated, and supported? • How to gain and sustain policy and institutional support? • How can African expatriate and foreign volunteer support for localisation be best used to

develop skills on the continent? • Where will the resources for localisation come from (especially for less widely spoken

and resource-poor languages)? • How might the information in this document and the website in which it is integrated (as

well as in related sites) be used and further developed to best assist all these efforts?

The information in this document and its appendices is ultimately intended to serve as a resource for consideration of such strategic questions, as much as to assist individual localisation initiatives. Specific information is intended to benefit locally-directed efforts, while the aggregation of specifics in principle shows patterns and connections in consideration of a larger whole. As such these are part of a process, and as with any process there are cycles of evaluation and vision, of review and revision.

Africa is the second largest continent, with some of the greatest linguistic diversity. However it is not yet well placed to take full advantage of new ICTs, let alone to shape them, in order to best respond to the realities and aspirations of its quickly growing population. The expanding multilingual potentialities of ICT also encounter a language policy and sociolinguistic environment that is not currently well positioned to take advantage of these advances.

Moreover, localisation, however successfully it is achieved, is not an end in itself. At the beginning of the document the topic of the "digital divide" was brought up and how localisation, by increasing access to and relevance of ICT, can contribute vitally to ameliorating that divide. The vision of localisation and consideration of how to achieve its sustainability, must therefore

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 96

involve discussion of how research on and products of localisation can link with ICT4D and ICT4E projects.

In this and indeed a larger sense, localisation is not merely dependent on other forces (per localisation ecology) for its success and contributions. It also represents a new dynamic in social, economic, technological, educational, linguistic, and political development in Africa no less than in other parts of the world. How effectively that dynamic can benefit larger processes is dependent on attention, planning and action, and indeed unity.

The latter point deserves a special note on closing. Unity is a theme that has concerned individual African states, Africa-wide gatherings (some under the banner of PanAfricanism), and continental bodies such as the African Union. Promoting use of individual African languages on national levels has often run into debates about the effect of such policy on national unity. Paradoxically, on the continental level, discussion of the promotion of African languages has frequently resulted in favourable declarations and even action plans, but in the end these have not resulted in much action by individual countries (or for that matter, major donors).

Approaching the issue of localisation on a PanAfrican basis – which has its practical reasons related to the need for collaboration on rapidly advancing multilingual ICT – also resonates in principle with the purposes of Africa-wide conferences and statements on promotion of African languages going back four decades. However, it may risk confronting the same ideological roadblocks on the level of individual governments and even ICT authorities as have previous PanAfrican initiatives relating to Africa's indigenous languages. This is especially the case when the question of prioritising some more widely spoken languages is raised.

It is therefore important to emphasise that the diversity represented by localised content and software does not represent a disunifying force, but rather a common enterprise, and that beginning with some languages will not exclude others, but rather develop resources and capacities to handle all. ICTs in a way are additive by their nature, in that advances for one language enhance rather than hinder the opportunities for work in other tongues.

Although it may seem paradoxical, in the same way that a single character coding system (Unicode) facilitates greater use of diverse scripts, so a PanAfrican approach to localising ICT in many parts of Africa may yield the best results for each and all. Small projects and initiatives unconnected with each other, unaware of the importance of the localisation ecology in Africa, and uninvolved in broader world discussions of local language computing, will not likely achieve sustainable results. But in linked together, that and more is possible.

In the long run, the hope and potential of ICT, through its localisation and adaptation to the languages and modes of communication, is to advance development in its broadest and most encompassing sense, the "revealing of potentialities." It is hoped that this document and the web-based resources with which it is integrated can in some small but significant way further that aim in and for Africa.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 97

11. ReferencesNote: The various profiles in the Appendices have other specialized references not noted here

Afnan-Manns, Sheila, and Aimée Dorr. 2003. "Re-evaluating the Bridge! An Expanded Framework for Crossing the Digital Divide through Connectivity, Capability, and Content. A Report on the Digital Divide's Multiple Dimensions: Indicators for Measuring Success." Los Angeles: The Pacific Bell/UCLA Initiative for 21st Century Literacies at the UCLA Graduate School of Education & Information Studies. http://www.newliteracies.gseis.ucla.edu/publications/re-eval_bridge.pdf

Agence de Presse Sénégalaise (APS). 2005. "Un professeur d'anglais à la retraite invente une écriture dénommée 'Typafrica'." APS 06-05-2005. http://www.aps.sn/artfiche.php?page=&id_article=8389

Agence de Presse Sénégalaise (APS). 2004. "Microsoft s'engage à traduire ses logiciels en Wolof (Officiel)." APS 25-10-04. http://www.aps.sn/artfiche.php?page=id_article=3220

Aziz, Naeesa. 2004. "South Africa: Rainbow Nation Pursues 'Elusive Equity'" (Interview). AllAfrica.com, December 15, 2004. http://allafrica.com/stories/200412150903.html

Ballantyne, Peter. 2002. "Collecting and Propagating Local Development Content: Synthesis and Conclusions." Research Report No. 7. IICD in association with the Tanzania Commission for Science and Technology; funded by DFID. http://www.iconnect-online.org/base/ic_show_news?sc=107&id=1878

Bamgboṣe, Ayọ. 1991. Language and the Nation: The Language Question in Sub-Saharan Africa. Edinburgh: Edinburgh University Press.

______. 1996. "Pride and prejudice in multilingualism and development." In. Richard Fardon and Graham Furniss (Eds), African Languages, Development, and the State. London: Routledge.

Benjamin, Peter. 2000. "African Experience with Telecenters." E-OTI [On the Internet] November-December 2000. http://www.isoc.org/oti/articles/1100/benjamin.html

Bergmann, Frank. 2005. "Open-Source Software and Localization." MultiLingual Computing 70 (Vol. 16, Issue 2). http://www.project-open.com/whitepapers/oss-l10n/

Bernard, H. Russell. 1996. "Language Preservation and Publishing." In N.H. Hornberger, ed. Indigenous Literacies in the Americas; Language Planning from the Bottom Up. New York: Mouton de Gruyter.

Bohannan, Paul, and Philip Curtin. 1971. Africa and Africans, rev. ed. Garden City, NY: Natural History Press.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 98

________. 1995. Africa and Africans, 4th ed. Prospect Heights, IL: Waveland Press.

Bokamba, Eyamba G. 1995. "The Politics of Language Planning in Africa: Critical Choices for the 21st Century." In Martin Putz, ed. Discrimination Through Language in Africa?: Perspectives on the Namibian Experience. Mouton de Gruyter.

Bourbeau, Laurent, and François Pinard. 2000. "Observations, réflexions et perspectives de l'informatisation des langues africaines." Paper prepared for : Bamako 2000 Internet - Les passerelles du développement - Rencontre organisée par le réseau ANAIS. Progiciels Bourbeau-Pinard Inc. (BPI). http://www.progiciels-bpi.ca/man/bam2000/index.html

Brock-Utne, Birgit. 2005. "Language-in-Education Policies and Practices in Africa with a Special Focus on Tanzania and South Africa: Insights from Research in Progress." In A.M.Y. Lin and P.W. Martin, eds. Decolonization, Globalization: Language-in-Education Policy and Practice. Multilingual Matters.

Campbell, David J., and Jennifer M. Olson. 1991. "Framework for Environment and Development: The Kite." CASID Occasional Paper No. 10. East Lansing, MI: Center for Advanced Study of International Development, Michigan State University.

Chanard, Christian. 2005. "Pour une transcription pérenne des langues africaines." Paper delivered at the 27th Internationalization and Unicode Conference (IUC27), Berlin, Germany, 6 April 2005.

Chanard, Christian, and Andrei Popescu-Belis. 2001. "Encodage informatique multilingue: application au contexte du Niger." Cahiers du Rifal, 22: 33-45.

Chaudenson, Robert. 2003. "Langues et numérisation: Français, créoles, langues africaines." In Isidore Ndaywel è Nziem, ed. Les langues africaines et créoles face à leur avenir. Paris: L'Harmattan. Pp. 131-152.

Cissé, Thierno, Chérif Mbodj, Marc Van Campenhoudt, and Mohamédoune Wane. 2004. "Expérimentation de normes de balisage en langues partenaires." In Penser la Francophonie: Concepts, actions et outils linguistiques. Actes des premières Journées scientifiques communes des réseaux de chercheurs concernant la langue, Ouagadougou (Burkina Faso), 31 mai – 1er juin 2004. Paris: AUF. Pp. 77-88. http://www.termisti.refer.org/ouagadougou.pdf

Coulmas, Florian. 1992. Language and Economy. Oxford: Blackwell.

Crawford, Susan. 2006. "Testing IDNs." CircleID http://www.circleid.com/posts/testing_internationalized_domain_names_idns/

Crystal, David. 2004. The Language Revolution. Malden, MA: Polity Press.

Daily Champion. 2004. "Igbo: Endangered Language" (Editorial). December 20, 2004. http://allafrica.com/stories/200412201271.html

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 99

Diki-Kidiri, Marcel, and Atibakwa Baboya Edema. 2003. "Les langues africaines sur la Toile." Cahiers du Rifal, 23: 5-32.

Diki-Kidiri, Marcel, Chérif Mbodj, and Atibakwa Baboya Edema. 1997. "Des lexiques en langues africaines (Sängö, Wolof, Lingala) pour l'utilisateur de l'ordinateur." Meta 42(1): 94-109 http://www.erudit.org/revue/meta/1997/v42/n1/003313ar.pdf

Diop, Cheick Anta. 1955. Nations nègres et culture. Paris: Éditions africaines.

Duncan, Otis Dudley. 1959. "Human Ecology and Population Studies." In P.M. Hauser and O.D. Duncan, eds. The Study of Population: An Inventory and Appraisal. Chicago: University of Chicago Press. Pp. 678-716.

Dwyer, David. 1997. A Webbook of African Language Resources. African Studies Center, Michigan State University. http://www.isp.msu.edu/AfrLang/hiermenu.html

Dwyer, David. 1987. A Handbook of African Language Resources. African Studies Center, Michigan State University

Elder, Laurent. 2002 "What a Difference IT Makes …" (an Acacia project site visit) http://web.idrc.ca/en/ev-9481-201-1-DO_TOPIC.html

Elugbe, Ben. 1998. "Cross-border and Major Languages of Africa." In K. Legère, ed. Cross-border languages : reports and studies, Regional Workshop on Cross-Border Languages, National Institute for Educational Development (NIED), Okahandja, 23-27 September 1996. Windhoek : Gamsberg Macmillan.

Enguehard, Chantal, and Chérif Mbodj. 2005. "Des correcteurs orthographiques pour les langues africaines." In S. Vienney et M. Bioud, eds. Bulag 29 - Correction automatique : bilan et perspectives. Centre Tesnière, Université de Franche-Comté. http://www.univ-fcomte.fr/download/pufc/document/doc_en_ligne/ouvrages_en_ligne/bulag_29_.pdf

Enguehard, Chantal, and Chérif Mbodj. 2003 "Flore: un site coopératif pour recueillir et diffuser les noms des plantes dans les langues africaines." Cahiers du Rifal', 23: 46-54.

Fantognan, Xavier. 2005. "A Note on African Languages on the Worldwide Web." In Paolillo, John, Daniel Pimienta, Daniel Prado, et al, eds. Measuring linguistic diversity on the Internet. A collection of papers. Montreal: UNESCO. (CI.2005/WS/06) http://unesdoc.unesco.org/images/0014/001421/142186e.pdf

Fill, Alwin. 2001. "Ecolinguistics: State of the Art 1998." In A. Fill and P. Mühlhäusler, eds. The Ecolinguistics Reader. London: Continuum. Pp. 43-53.

Gadelli, Karl Erland. 1999. "Language Planning: Theory and Practice. Evolution of Language Planning Cases Worldwide." Paper for the Languages Division, Education Sector, UNESCO. Paris: UNESCO.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 100

Gillwald, Alison (ed.). 2005. "Towards an African e-Index: Household and individual ICT Access and Usage Across 10 African Countries." Research ICT Africa!, http://www.researchictafrica.net/modules.php?op=modload&name=News&file=article&sid=504

Gordon, Raymond G., Jr. (ed.), 2005. Ethnologue: Languages of the World, 15th edition. Dallas: SIL International. Online version: http://www.ethnologue.com/

Hartell, Rhonda L., ed. 1993. The Alphabets of Africa. Dakar: UNESCO and Summer Institute of Linguistics. (The French edition, published the same year, is entitled Alphabets de Langues Africaines).

Haugen, Einar. 2001. "The Ecology of Language." In A. Fill and P. Mühlhäusler, eds. The Ecolinguistics Reader. London: Continuum. Pp. 57-66. [Originally published in: A.S. Dil, ed. 1972. The Ecology of Language: Essays by Einar Haugen. Stanford: Stanford University Press.]

Herbert, Robert K. 1992. "Language in a Divided Society." In R.K. Herbert, ed. Language and Society in Africa: The Theory and Practice of Sociolinguistics. Witwatersrand University Press. Pp. 1-19.

Hosken, Martin. 2003. "An introduction to keyboard design theory: What goes where?" SIL http://scripts.sil.org/KeybrdDesign

IDRC. 2005. "The Acacia Atlas: Mapping African ICT Growth." Ottawa: IDRC.

International Organization for Standardization. 2006. "ISO and Africa." June 2006. http://www.iso.org/iso/en/comms-markets/developingcountries/pdf/iso_and_africa.pdf

International Organization for Standardization. 2004. "ISO 15924:2004. Information and documentation -- Codes for the representation of names of scripts." January 2004.

International Organization for Standardization. 2002. "ISO 639-1:2002. Codes for the representation of names of languages -- Part 1: Alpha-2 code."

International Organization for Standardization. 1998. "ISO 639-2:1998. Codes for the representation of names of languages -- Part 2: Alpha-3 code, first edition."

International Organization for Standardization. 1988. "ISO 3166:1988. Codes for the representation of names of countries, 3rd edition." August 1988.

Internet World Stats: Africa. 2006. http://internetworldstats.com/africa.htm

Jensen, Mike. 1998. "African Internet Connectivity: Summary of International ICT Development Projects in Africa." http://www3.sn.apc.org/africa/projects.htm

________. 2002. "Information and Communication Technologies (ICTS) in Africa – A Status Report." http://www.itu.int/osg/spu/wsis-themes/UNMDG/jensen-icts-africa.doc(approve sites)

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 101

Joshi, R.M., and P.G. Aaron, eds. 2005. Handbook of Orthography and Literacy. Lawrence Erlbaum Associates.

Leclerc, Jacques. n.d. "L'aménagement linguistique dans le monde." http://www.tlfq.ulaval.ca/axl/afrique/afracc.htm

Lotanna, Adiekwue A. 2005. "Revitalising the Igbo Language." Daily Champion November 28, 2005. http://allafrica.com/stories/200511280234.html

Mackey, William F. 1989. "Status of Languages in Multinational Societies." In U. Ammon, ed. Status and Function of Language and Language Varieties. Berlin: Walter de Gruyter. Pp. 3-20.

Mann, Michael, and David Dalby. 1987. A thesaurus of African languages: A classified and annotated inventory of the spoken languages of Africa with an appendix on their written representation. London: Hans Zell Publishers.

Mas, Jordi. 2003. "La salut del català a Internet." Softcatalà. http://www.softcatala.org/articles/article26.htm

Matwyshyn, Andrea M. 2003. "Silicon Ceilings: Information Technology Equity, the Digital Divide and the Gender Gap among Information Technology Professionals." Northwestern Journal of Technology and Intellectual Property 2(1). http://www.law.northwestern.edu/journals/njtip/v2/n1/2

Mazrui, Ali A., and Alamin M. Mazrui. 1998. The Power of Babel: Language and Governance in the African Experience. Chicago: University of Chicago Press.

Microsoft Corporation. 2004. "Microsoft Enables Millions More to Experience Personal Computing Through Local Language Program." PressPass - Information for Journalists, March 16, 2004. http://www.microsoft.com/presspass/press/2004/mar04/03-16LLPPR.asp

Miller Esselaar and Associates. 2001. "A Country ICT Survey for Tanzania: Final Report," November 2001, Prepared for Swedish International Development Cooperation Agency http://www.milless.co.za/downloads/Sida%20report%20-%20Tanzania.pdf

Ndaywel è Nziem, Isidore, ed. 2003. Les langues africaines et créoles face à leur avenir. Paris: L'Harmattan.

Okombo, D. Okoth. 2001. "Language Policy: The Forgotten Parameter in African Development and Governance Strategies." Inaugural Lecture, University of Nairobi.

Ongarora, David Ogoti. 2002. “African Languages in Development: Prospects and Encumbrances.” In Francis R. Owino, ed. Speaking African: African Languages for Education and Development. Cape Town: CASAS. Pp. 63-75.

"Open Source's Local Heroes." 2003. The Economist 4 December 2003

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 102

Osborn, Donald. 2005. "African Languages and Information and Communication Technology: Localizing the Future." Localisation Focus, 4(2), June 2005.

Osborn, Donald. 2004. "African Languages and ICT: Literacy, Access and the Future." Paper presented at the Annual Conference of African Linguistics, Harvard University, Cambridge, MA - April 2-4, 2004

Osborn, Donald. 2001. "The Knotty Problem of Using African Languages for E-Mail and Internet." Balancing Act's News Update, No. 69. http://www.balancingact-africa.com/news/back/balancing-act_69.html

Osborn, Donald. 1999. "Ultimate Development Participation: Institutionalizing Indigenous Language Use in Education and Research." In Proceedings of the 23rd Annual Third World Conference, Chicago, March 19-22, 1997. Chicago: Third World Conference Foundation.

Otter, Alistair. 2004. "Uganda gets indigenous language browser." Tectonic, 15 Sept. 2004. http://www.tectonic.co.za/view.php?id=342

Paolillo, John. 2005. "Language Diversity on the Internet." In Paolillo, John, Daniel Pimienta, Daniel Prado, et al, eds. Measuring linguistic diversity on the Internet. A collection of papers. Montreal: UNESCO. (CI.2005/WS/06) http://unesdoc.unesco.org/images/0014/001421/142186e.pdf

Pastore, Michael. 2000. "Web pages by language." ClickZ Stats, July 5, 2000. http://www.clickz.com/stats/big_picture/demographics/article.php/5901_408521

Philips, John Edward. 2000. Spurious Arabic: Hausa and Colonial Nigeria. Madison, Wisconsin: African Studies Program, University of Wisconsin.

Prah, Kwesi Kwaa. 2003. "Going Native: Language of Instruction for Education, Development and African Emancipation." In B. Brock-Utne, Z. Desai and M. Qorro (eds). Language of Instruction in Tanzania and South Africa (LOITASA). Dar-es-Salaam: E & D Limited. http://www.casas.co.za/papers_native.htm

______. 2002. "Language, Neo-colonialism, and the African Development Challenge." TRIcontinental, No. 150. http://www.casas.co.za/papers_language.htm

______. 2000. Mother Tongue for Scientific and Technological Development in Africa, 3rd ed. Cape Town: Centre for Advanced Studies of African Society.

Rambo, A. Terry. 1983. "Conceptual Approaches to Human Ecology." East-West Environment and Policy Institute Research Report No. 14. Honolulu: East-West Environment and Policy Institute.

Rathke, Eike. 2005. "Internationalization for Localization (i18n for l10n)." OpenOffice.org Conference 2005, Koper/Capodistria, Slovenia.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 103

Renaud, Pascal. 1994. "Le projet RIO, historique, organisation, partenaires." http://www.unitar.org/isd/publications/rio%5Cprog-rio94.html

Robinson, Clinton D.W. 1996. Language Use in Rural Development: An African Perspective. Berlin: Mouton de Gruyter.

Secka, Pa Modou. 2005. "Local Alphabet to Be Launched." The Independent (Banjul). February 25, 2005. http://www.qanet.gm/Independent/independent.html http://allafrica.com/stories/200502250471.html

Senge, Peter. 2006. The Fifth Discipline: The Art & Practice of The Learning Organization, Rev. ed. New York: Doubleday/Currency

Senne, Damaria. 2006. "Zulu, Xhosa SMS made possible." ITWeb http://www.itweb.co.za/sections/telecoms/2006/0606231042.asp?S=Cellular&A=CEL&O=FRGN

Shanglee, Russell. 2004. "Localization in African Languages: Translators Face Linguistic Challenges as They Localize Modern Technology." Multilingual Computing and Technology 61 (Vol. 15, Issue 1).

Simala, Inyani K. 2002. "Empowering indigenous African languages for sustainable development." In Francis R. Owino, ed. Speaking African: African Languages for Education and Development (pp. 45-53). Cape Town: CASAS.

Sola, Javier. 2005. "Preparing a Free and Open Source Software Localisation and Deployment Project." Version 0.6 – 24/12/2005 Open Source Localisation Toolkit

Sow, Alfâ Ibrâhîm, ed. 1977. Langues et politiques de langues en Afrique noire : l'Expérience de l'UNESCO. Paris: Nubia.

Suzuki, Izumi, Yoshiki Mikami, Ario Ohsato, and Yoshihide Chubachi. 2002. "A language and character set determination method based on N-gram statistics." ACM Transactions on Asian Language Information Processing 1(3): 269-278. http://portal.acm.org/affiliated/citation.cfm?id=772759&dl=guide&coll=ACM&CFID=15151515&CFTOKEN=6184618

Tadadjeu, Maurice. 1993. "Cameroon." In R.H. Hartell, ed. The Alphabets of Africa. Dakar: UNESCO and SIL.

Tadadjeu, Maurice, and Etienne Sadembouo, eds. 1984. General alphabet of Cameroon languages, adopted by the National Committee for the Unification and Harmonization of the Alphabets of Cameroon Languages from 7th to 9th March 1979 in Yaoundee . Yaounde: University of Yaounde, Faculty of Letters and Social Sciences, Dept. of African Languages and Linguistics.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 104

TeleCommons Group. 2000. "Rural Access to Information and Communication Technologies: The Challenge of Africa." Prepared for the African Connection Secretariat, with support from the Information for Development Program (infoDev) http://www.telecommons.com/reports.cfm?itemid=122

Texin, Tex. 2006. "What is wrong with Locales?" Unpublished presentation. http://www.i18nguy.com/locales/Locales.pdf

Togocity.com. 2006. "L’écriture informatisée de la Langue EWE est maintenant possible." March 8, 2006 http://www.togocity.com/article.php3?id_article=914

UNECA. NICI in Africa. http://www.uneca.org/aisi/nici/country_profiles/

UNESCO. 2005. "Measuring Linguistic Diversity on the Internet." http://portal.unesco.org/ci/en/ev.php-URL_ID=20804&URL_DO=DO_TOPIC&URL_SECTION=201.html

USINFO. 2006. "Internet Connections Growing Fastest in Africa." U.S. Department of State, Bureau of International Information Programs. http://usinfo.state.gov/dhr/Archive/2006/Apr/28-229950.html

Van der Veken, Anneleen, and Gilles-Maurice de Schryver. 2003. "Les langues africaines sur la Toile: Etude des cas haoussa, somali, lingala et isixhosa." Cahiers du Rifal, 23: 33-45.

Wiley, David and David Dwyer. 1980. "African Language Instruction in the United States: Directions and Priorities for the 1980s." Michigan State University, African Studies Center.

Williamson, Kay. 1984. Practical orthography in Nigeria. Ibadan: Heinemann Educational Books Ltd.

Yacob, Daniel. 2004. "Localize or be Localized: An Assessment of Localization Frameworks." Paper presented at the International Symposium on ICT: Education and Application in Developing Countries, Addis Ababa, 19-21 October 2004.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 105

12. AppendicesThere are five appendices, each of which is in ongoing evolution on the PanAfrican Localisation project site wiki as “L10n (Localisation) Profiles” at http://www.panafril10n.org/wikidoc/pmwiki.php/PanAfrLoc/Profiles :.

1. Major Languages (93, plus pages on four other lists of major African languages)

2. Writing Systems (11)

3. Countries (54)

4. InterAfrican Organisations (numerous links and some separate pages on the wiki)

5. L10n Tools (numerous links and some separate pages on the wiki)

Each of these appendices includes a title page and then a number of subpages (number for the first three indicated in parentheses) that are accessible by link within the wiki from the title page. In this section only the title pages will be reproduced. A link at the end of each Appendix will take one to the corresponding wiki page.

In order to more fully understand the levels of usage of African languages and Arabic in ICT, and the kinds of software and content localization that are being undertaken in the region, it is helpful to attempt a country-by-country survey and to consider what is being done with some of the most widely spoken languages on the continent.

Once the subject of languages is brought up in the context of use of text on computers, the issue of writing systems is also important to consider. Then too, there are language-related issues relevant to localisation which, like so many African languages, cross borders and involve more than one country. Very quickly the large scope of the project becomes apparent.

This undertaking, therefore, is ambitious. If we suppose to do it in a document of this size and in a relatively short period of time, it is because there does not yet seem to be much activity despite the emerging need.

Such a study serves several purposes within (and beyond) this project. First of all it will offer a "clarity through specificity" to discussions of localisation. Second, it will serve as a kind of baseline against which to measure changes. Third, it will inform work on the Website. And fourth, beyond the immediate aims of the project, it will serve to raise awareness on the continent and among people abroad who would help it in matters of ICT about the multilingual applications of the technology, both currently underway and potential.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 106

It is done in four sections for reasons related to the history of the region, in particular how colonial borders tended to split language groups, and to the facts that current ICT and language policies tend to be country-specific, but that at the same time there have been since independence inter-Africa discussions and agencies concerning language.

It is helpful thus to be able to frame and reframe questions of localisation to meet these multiple realities in the most appropriate ways.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 107

12.1 Appendix I: Major Languages of AfricaLanguage is the central consideration in localisation, and questions regarding choice of languages to localise in, choice of dialect when a language has more than one, and prioritization of languages when there will be work on several, are likely to confront any localisation effort. Therefore it is useful to have more information on the languages, their interrelationship, and their contemporary use. In brief, it is important to look at specifics to the extent possible when discussing localisation in Africa.

This section therefore provides profiles of a select group of the most widely spoken languages in Africa. An explanation of the choice of languages follows the list of language profiles, below.

1. Language Profiles for Localisation

(A template of the topics covered in each language profile provides explanations of the topics.)

• Afrikaans • Akan • Amharic • Anyi, Baule (en français : Agni et Baoulé) • Arabic • Bamileke • Bedawi (Beja) • Bemba • Beti (Ewondo, Fang, Bulu) • Berber • Chewa, Nyanja • Chokwe, Ruund • Dagaare • Dinka • Ebira (Igbera) • Edo (Bini) • Efik, Ibibio, Anaang • Fula (en français : Peul) (Fulfulde, Pulaar, Pular) • Ganda (Luganda, oluGanda..) • Gbaya • Gbe (Ewe, Mina, Fon) • Gikuyu • Gogo • Gurage • Hausa (en français : Haoussa) • Hehe • Idoma • Igbo • Ijo • Isle de France Creole • Kalenjin (Nandi, Kipsigis)

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 108

• Kamba • Kanuri • Kimbundu • Kongo (Kituba) • Kpelle (en français : Guerzé) • Krio, Pidgin (Cluster) • Kru, Bassa • Lingala • Lozi (Silozi) • Luba (Chiluba) • Luo, Acholi, Lango • Luyia • Maasai • Makua, Lomwe • Malagasy • Manding (en français : Mandingue) (Bamanan/Jula/Mandinka/Maninka) • Mende, Bandi, Loko • Meru • Mongo, Nkundo • Moore • Nama • Ndebele, Northern • Ndebele, Southern • Nubian • Nuer • Nupe • Nyakusa • Oromo • Oshiwambo • Runyakitara • Rwanda, Rundi • Sango • Sara • Senufo (Senari) • Serer • Shona • Sidamo • Somali • Songhai, Zarma • Soninke • Sotho, Northern (Sepedi) • Sotho, Southern (Sesotho) • Sukuma, Nyamwezi • Suppire, Minianka • Susu (& Yalunka) (en français : Soussou et Djalonké) • Swahili (Kiswahili) • Swazi • Temne • Teso, Turkana • Tigrinya • Tiv

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 109

• Tsonga • Tswana (Setswana) • Tumbuka • Umbundu • Venda • Wolof • Xhosa • Yao, Makonde • Yoruba • Zande • Zulu

2. Major Languages of Africa - explanation

2.1 Many languages, many dialects

The figure of 2000 languages is often cited for Africa, representing about a third of the living languages of the world. A lot depends on how one defines language. Many languages have dialectal variants, and in many cases tongues with different names are so closely related that native speakers can communicate. At what point is a variant so distinct as to be considered a separate language? When can different variants be treated as a unit? These are critical questions for localisation in many cases, and sometimes there may be more than one answer depending on the nature and goals of the localisation.

SIL International, through its well known encyclopaedic effort called Ethnologue to document all human languages, has tended to distinguish among dialects as separate languages. While this may be appropriate when considering the linguistic characteristics of a particular tongue and the precise way to translate important texts, it is arguably of less importance for verbal communication and for less exacting text requirements like a set of commands in a software interface.

On the other end of the spectrum is the tendency to group together interintelligible tongues - usually dialects of a language. This is the approach for instance of the Centre for Advanced Studies of African Societies (CASAS) in its advocacy and research work. For purposes of this study, similar tongues will be considered together, though reference will be made to Ethnologue's linguistic information.

2.2 Choice of languages for this document

The question then arises how to choose which languages to consider in this report. A useful list is that arrived at for an entirely different purpose - prioritizing African language instruction efforts at U.S. universities. In 1979, specialists in African languages identified a total of 83 languages (some grouped) based on their importance in terms of number of speakers and regional use. A project headed by linguist David Dwyer of Michigan State University compiled information on these languages for publication (Dwyer 1987) and eventual posting on a website (Webbook of African Language Resources 1997). Along the way the number of languages was raised to 85 by the splitting up of one large grouping of southern African languages.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 110

Because that list was not compiled with localisation in mind, it was recognized when adopting it for this PanAfrican Localisation project, that it probably would need to be modified. In addition, several questions were identified regarding the choice of languages, including:

• For some smaller countries, their main indigenous language(s) are not included - is this a problem and should all countries in Africa have at least one language represented in the list?

• How appropriate are some of the language categories that concern clusters of languages? That is, that the category includes tongues that are different enough that even though they may share a common origin, be closely related, or bear the same name, they can't be considered a single unit for localisation purposes.

• Should we subdivide the list of languages by priority for attention as the abovementioned Handbook does?

• Are there other sources that should be consulted about the list of languages and the inclusion of others?

We began this list with the 85 languages in the Webbook in 2005 for the purpose getting the process started, and as a basis for discussion in considering what ICT has to offer for the larger number of African languages. Much of the information we began with was drawn from the Webbook information, with additional information from sources such as Ethnologue added in progressively.

3. Changes to the list of Major Languages

3.1 Changes in 2005

Several changes have been made as of early August 2005:

• Afrikaans has been added as it is the third major language in South Africa in terms of speakers, and also spoken in Namibia. (It was not in the Webbook listing in early 2005, but has since been added.)

• Beti has been added as it is a major cluster of interintelligible languages (Ewondo, Fang, Bulu) in southern Cameroon, Equatorial Guinea, and Gabon. (It was not in the Webbook listing.)

• Kikuyu has been changed to Gikuyu, which is the preferred and more common usage now. • Runyakitara has been added, incorporating the previous information on Nyoro, one of the four

related tongues covered by this standardization. It is spoken mainly in Uganda. • Northern Sotho (Sepedi), Southern Sotho (Sesotho), and Tswana have been separated to

individual pages. They are separately listed as "official languages" in South Africa and despite their similarity have separate literature.

3.2 Changes in 2006

In August-September 2006 five other lists of languages in Africa were reviewed to get an idea of which ones other experts considered as meriting special attention. These included:

1. A list of 50 cross-border languages and language clusters put together by linguist Ben Elugbe of the University of Ibadan. This was part of an article for presentation at a conference on crossborder languages held in Okahandja, Namibia in 1996 (Elugbe 1998; Legère 1998).

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 111

2. A list of about 400 languages published by French development expert Michel Malherbe (2000). Some of these are indicated as most important in terms of usage.

3. A list of 159 "community languages" published in 1985 as part of a survey of use of African languages in literacy and education.

4. A list of 12-15 "core languages" that, according to linguist Kwesi Kwaa Prah (2002, 2003), are spoken by 75-85% of Africans as first or additional languages.

5. A list of nationally and areally dominant languages by country from an appendix of a book by linguist Herman Batibo of University of Dar es Salaam (Batibo 2005).

Like the list of languages covered in Dwyer's Handbook/Webbook, these lists served purposes other than localisation of ICT. Nevertheless, they offer a helpful way to check whether there may be other languages that ought to be added based on number of speakers and importance of use.

Based on the information from those lists, the following languages were added (these include the numerically most significant of the languages from Elugbe's list that were not previously on this list ...):

• Bedawi or Beja - a language of about a million speakers in Sudan and Eritrea. • Dagaare - a language (or cluster) of about a million speakers in northern Ghana and southern

Burkina Faso. • Kwanyama - a language of about a million speakers in southeastern Angola and the Caprivi

region of Namibia. It has been the subject of an effort at cross-border planning.

3.3 Changes in 2007

A new page on Oshiwambo replaces that of Kwanyama. Oshiwambo is an appellation covering the very closely related tongues Kwanyama, Ndonga, and Kwambi.

The article on Ndebele has been split into separate articles on Northern Ndebele, a Nguni language, and Southern Ndebele, which is closer to the Sotho languages.

4. Other languages

Ultimately the selection of languages in the list does not indicate a determination that localization must take place in them, or a decision that localization is unimportant in other tongues. It is merely a way to begin discussing the specifics localisation.

If you would like to suggest other languages, please enter them on the More Languages page with a brief explanation justifying their inclusion in this list.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 112

12.2 Appendix II: Writing Systems of AfricaWhen considering localisation on the African continent, it is often necessary to consider the issue of scripts used.

The whole continent to one degree or another uses the Latin script. This is a legacy of history and current global realities. And indeed, many countries use only this script, though often with additional modified characters. The second most widely used, and dominant in North Africa, is the Arabic script.

However there are also other living writing systems long established on the continent such as Tifinagh and Ge'ez/Ethiopic, as well as other newer indigenous ones (such as Vai, Mende KiKaKui, Bamum, N'ko, and Mandombe).

Pages on the major scripts give background (historic, technical with regard to computing in Africa) and brief overviews of technical issues important to localisation.

1. List of scripts used in Africa

1.1 Major scripts used in many countries for many languages

• Latin script • Arabic script

1.2 Regionally important scripts

• Ge'ez/Ethiopic • Tifinagh • N'ko

1.3 Language-specific scripts

• Osmaniyya • Vai script • KiKaKui (Mende) • Bamum script • Mandombe

1.4 Historically important scripts

• Coptic script • Old Nubian script

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 113

2. Codes for the representation of names of scripts

ISO-15924 defines 4-letter codes and 3-number codes for various writing systems, including many of the above. These may be used in the locale definitions or in HTML mark-up, but in many cases they are not necessary (you would not need to use a code for Arabic script for a webpage in Arabic language, for instance). For a complete list of the codes, see http://www.unicode.org/iso15924/iso15924-codes.html .

These codes are also available as part of the IANA registry of codes at http://www.iana.org/assignments/language-subtag-registry

3. Some websites about writing systems in Africa

3.1 General

• Afrikan Alphabets http://www.ziva.org.zw/afrikan.htm • African Writing Systems

http://www.library.cornell.edu/africana/Writing_Systems/Welcome.html • Wikipedia, "Writing systems of Africa," http://en.wikipedia.org/wiki/Writing_systems_of_Africa

3.2 Character/alphabet information

• Système alphabétiques des langues africaines http://sumale.vjf.cnrs.fr/phono/index.htm • See also specific language and country profiles

4. Issues in the display of text in African languages

A brief overview of ways people adapt African language orthographies for text in ICT is given in a revised version of an article from several years ago: African Language Text Issues.

5. Other ways of looking at what is a "writing system"

There are other ways to look at what is a "writing system" that include various kinds of designs and graphical elements that can be used individually or in combinations to convey various concepts and ideas. The African Writing Systems website above has some information on these.

In the context of localization, however, we are focusing on scripts that represent the full range of a language's communication.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 114

12.3 Appendix III: Countries of AfricaThis section includes brief profiles of the ICT and language situation of all countries of the African continent.

Policies, including those relating to languages and to ICT, are of course defined within countries by their respective governmental systems. Localisation efforts in Africa also tend to focus on the country in which they are based. It is essential therefore to consider various factors on the country level - at which some languages are grouped, and others divided - and how these affect localisation.

Each country page has: a summary of languages spoken there with mention of language policies; a summary of the ICT situation, with attention to software issues and ICT policy; and a summary of what we have learned about the localisation situation there.

The list of countries (and one territory) follows in alphabetical order:

• Algeria • Angola • Benin • Botswana • Burkina Faso • Burundi • Cameroon • Cape Verde • Central African Republic • Chad • Comoros • Congo (Brazzaville) • Congo, Dem. Rep. (Kinshasa) [ex-Zaire] • Côte d'Ivoire • Djibouti • Egypt • Equatorial Guinea • Eritrea • Ethiopia • Gabon • Gambia • Ghana • Guinea • Guinea Bissau • Ivory Coast (see Côte d'Ivoire) • Kenya • Lesotho

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 115

• Liberia • Libya • Madagascar • Malawi • Mali • Mauritania • Mauritius • Morocco • Mozambique • Namibia • Niger • Nigeria • Reunion • Rwanda • São Tomé e Príncipe • Senegal • Seychelles • Sierra Leone • Somalia • South Africa • Sudan • Swaziland • Tanzania • Togo • Tunisia • Uganda • Zambia • Zimbabwe

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 116

12.4 Appendix IV: InterAfrican DimensionsThere are a number of agencies, institutions and organizations in Africa that have a mandate or scope of activity that is not limited to one country, or which spans a region within Africa or the whole continent. This page will build a list of such organisations which deal in one way or another with languages, ICT, and localisation.

In addition, many of the languages of Africa cross the borders of two or more countries. To varying degrees some of the language-oriented institutions deal with these. Reference links to these languages will be included as well.

1. Agencies, Institutions & Organisations

1.1 Language

Agencies, institutions and organisations that are in some ways involved in work or study of African languages (or linguistics) in more than one country.

Intergovernmental

• ACALAN • CELHTO • CERDOTOLA • CICIBA • CIDLO • EACROTANAL

Non-governmental

• BASAL • CASAS • SIL

Academic

• WARC/CROA

1.2 ICT

Agencies, institutions and organisations that are in some ways involved in work in or promotion of use of ICT in more than one country.

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 117

Conferences

FOSS groups

• AAUL (Association Africaine des Utilisateurs de Logiciels Libres)

Intergovernmental (This concerns ICT; see above for intergovernmental organisations focusing on language.)

• AISI (African Information Society Initiative)

1.3 Localisation

Agencies, institutions and organisations that are in some ways involved in work in or promotion of localisation of software and/or content in more than one country.

African & active in more than one country

• Translate.org.za • Kasahorow ?

External & active in more than one country

• LLSTI • RIFAL

2. Cross-border languages

List of cross-border languages compiled by Dr. Ben Elugbe (1996)

3. Regional dynamics

• Maghreb • Sahel ? • others...

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 118

12.5 Appendix V: Localisation (L10n) ToolsThis page brings together some information and links of general use to localisation efforts, and multilingual ICT, in and for Africa. Particular attention is given to what are sometimes called "tools" that assist in localising software or content. This information, together with specific language, country and script information from the profiles, are intended to be of use to localisers. (Be sure to consult also the relevant language, country, and if necessary writing system pages for resources specific to the work you are doing.)

1. Understanding How the Parts Fit Together

• Standards o ISO

• Daniel Yacob's "L10n Layercake"

2. Keyboards & Input

2.1 Production keyboards

• Konyin

2.2 Keyboard layout creators

Computer keyboard layouts are generally defined in the software for a language. In the case of major European languages, Arabic, and some others, there are production keyboards that match these layouts (the only example for Africa is the Konyin keyboard). In localising a software for another language, one can specify the keyboard layout. It is also possible to create a keyboard layout for many language needs that can be used with another software. This can be done using a keyboard layout creator such as one of the below:

• Tavultesoft Keyman (page includes links to Tavultesoft site & available keyboards) • MSKLC (page includes links to MSKLC page & available keyboards)

2.3 Character utilities & virtual keyboards

Several sites include ways of finding or generating characters that can be copied and used in another application. These may be useful in creating documents or web content.

• BabelMap (Unicode Character Map Utility) http://www.babelstone.co.uk/Software/BabelMap.html

• Lexilogos claviers multilingues (incl. العربية & conversion utilities) http://www.lexilogos.com/clavier/multilingue.htm

• Unicode character pickers http://people.w3.org/rishida/scripts/pickers/

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 119

2.4 Keyboard standards

• ISO-9995

3. Fonts & rendering

3.1 Finding fonts

The following sites have information on Unicode fonts, including many that are useful for African languages":

• Alan Wood’s Unicode Resources: Unicode fonts for Windows computers http://www.alanwood.net/unicode/fonts.html

• Gallery of Unicode Fonts: o Ethiopic http://www.wazu.jp/gallery/Fonts_Ethiopic.html o Tifinagh http://www.wazu.jp/gallery/Fonts_Tifinagh.html

• Unifont.org Unicode Font Guide for Free/Libre Open Source Operating Systems http://unifont.org/fontguide/

• Freelang.com site (Arabic, Amharic; not clear which Latin ones are Unicode extended) o EN http://www.freelang.net/fonts/index.html o FR http://www.freelang.com/polices/index.html

• On this PanAfri10n wiki: pages about Unicode fonts

3.2 Creating fonts

Before Unicode, character needs in Africa beyond the character sets provided for in 8-bit standards (like ISO-8859-1) were met by modifying fonts - changing certain letters. This is no longer necessary. However, it is possible to create new fonts (or recode old ones) in Unicode encoding. This part will list some (lists of) utilities for that:

• Luc Devroye's page on Font creation programs (extensive list; not clear which are Unicode compliant) http://cg.scs.carleton.ca/~luc/editors.html

• The Freelang.com pages mentioned above offer a font creation service

3.3 Other font information

• Microsoft Typography - Fonts and Products http://www.microsoft.com/typography/fonts/

4. Tags & locales

4.1 Language tags

• ISO-639-1 &2 http://www.loc.gov/standards/iso639-2/php/English_list.php • ISO-639-3 http://www.sil.org/iso639-3/codes.asp

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 120

4.2 Country tags

• ISO-3166-1 o English country names and code elements http://www.iso.org/iso/en/prods-

services/iso3166ma/02iso-3166-code-lists/list-en1.html o French country names and code elements http://www.iso.org/iso/en/prods-

services/iso3166ma/02iso-3166-code-lists/list-fr1.html

4.3 Locale data

• Common Locale Data Repository (CLDR) Project http://www.unicode.org/cldr/ • Locale Generator (aka LocaleGEN) v. 1.6 http://www.it46.se/localegen/ • Yeha - Locales for East Africa (may be out of date http://yeha.sourceforge.net/ • GNU libc locale patch page http://www.hungry.com/~pere/linux/glibc/ • IBM International Components for Unicode demo on locales (lacks some languages in CLDR)

http://demo.icu-project.org/icu-bin/locexp • Microsoft "Locale Explorer" approaches to displaying locale information

o Culture Explorer 2.0 (for MS .Net) Site

http://www.gotdotnet.com/Community/UserSamples/Details.aspx?SampleGuid=B778FF2C-9142-4769-839A-094F51A6F9F4

Discussion http://blogs.msdn.com/michkap/archive/2006/12/20/1328701.aspx o The Locale Explorer (an older MS tool) http://www.flounder.com/localeexplorer.htm

5. Software localisation & tools

5.1 FOSS

• OpenOffice (office suite) • AbiWord (wordprocessor) • GRASS (GIS software) • Ubuntu (Linux-based operating system)

5.2 Tools

• Pootle (tool to facilitate translation of FOSS software) • Rosetta Translation Portal https://launchpad.net/rosetta

5.3 Meta-sites

• SourceForge.net http://sourceforge.net/index.php • Microsoft Global Development & Computing Portal

http://www.microsoft.com/globaldev/vista/vistahome.mspx

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 121

6. Website localisation & tools

• W3C: o Markup Validation Service, v0.7.4 http://validator.w3.org/ o Site Index (many resources) http://www.w3.org/Consortium/siteindex

7. Research & advanced applications

7.1 Translation tools

• "MT" for Africa Computer translators & African languages http://www.bisharat.net/Trans/

7.2 Text-to-speech

• LLSTI

A Survey of Localisation in African Languages, and its Prospects (Feb. 2007) 122