19/08/05

Lost in translation: getting India’s languages online

Copyright: WHO/TDR/Crump

Send to a friend

The details you provide on this page will not be used to send unsolicited email, and will not be sold to a 3rd party. See privacy policy.

Imagine getting an email or accessing a website, and finding you cannot read it without downloading extra fonts. Speed — the heart and soul of internet access — suddenly becomes impossible.


This is a daily reality for millions in India, a country that has 18 official languages, 1,652 mother tongues (33 of them spoken by over 100,000 people), and dozens of different scripts (see table). 


Each Indian-language script is different from the other, and can be written in different ways. Some, like Urdu and Sindhi, are written right to left, others from left to right. Others, like Hindi, have extra flourishes that act as vowels or modify pronunciation.


Finding solutions for making these native tongues available to computer users is vital to bring communication and information technologies to India’s entire population.


Other non-English speaking countries face similar problems. Pakistan, for instance, has been seeking software in its national language Urdu, while Bangladesh wants solutions in Bengali.


Multinational corporations, arguably, are largely responsible for the problem. They rarely bother to translate software into local languages because of the lack of commercial gain: few of the people who speak them can afford expensive software.




Nine out of ten Indians do not speak fluent English,
making computing in their native tongue essential
Photo Credit: www.fiveyards.com

Targeted localisation


South Asia isn’t alone. Dwayne Bailey at the non-profit organisation Translate.org.za says South Africa has similar problems.


Bailey and his team are translating ‘open source’ software, which is distributed and modified for free, into all 11 South African languages: Afrikaans, English, Xhosa, Ndebele, Northern Sotho, Siswati, Southern Sotho, Tsonga, Tswana, Venda and Zulu.


All these languages are written using the Latin alphabet, so the task is not as complex as it is in India.


The golden rule, says Bailey, is that applications chosen for translation should be appropriate for a general audience. “Our logic is that [it should benefit] the people whom language would most affect,” he says. “Someone who can program could have probably mastered English already. Localisation must be aimed at the end-user.”


It also needs to take into account the needs of the people who use the software. Ravishankar Shrivastava, for instance, has been writing fiction in Hindi for two decades, but putting his written work into an electronic format that he could submit to publishers has proven difficult.


Shrivastava recalls his excitement when, in the late 1980s, he came across a personal computer that allowed him to type in Hindi. “I thought it was a gift to Hindi speakers”.


But as computing technology progressed, new software that enabled users to write in their Indian mother tongues became more expensive – and also came with limitations.


Shrivastava tried several computer packages for writing in Hindi. Some were too time-consuming, demanding that you press several keys to type a single character. Other programs made it impossible to exchange text unless the person on the receiving end had the same software.


In India’s dotcom surge of the late 1990s, various Hindi newspapers went online.  But they all used different, incompatible fonts.




Computer icons with Hindi application names on a
PC desktop
Photo Credit: IndLinux

The open source connection


The open source avenue could be one way out of the problem. Cutting across traditional South Asian rivalries and distrust, groups of open source enthusiasts — some in India — are talking to each other on how they can collaborate to build solutions.


For Shrivastava, this involves the ‘Indian Linux’ or IndLinux project. Linux is an open source operating system that has been around since 1991. The people working on IndLinux want to tailor it to local Indian languages.


They have teams working on Bengali, Gujarati, Gurmukhi, Hindi, Kannada, Malayalam, Marathi, Oriya, Tamil and Telugu.


“We want to make technology accessible to the majority of India that does not speak English,” says G. Karunakar, a volunteer at IndLinux.


So far, the project has designed operating systems in Bengali and Tamil, and the Hindi version is nearly complete.


But there have been complications. For instance, the alphabets are laid out differently on different keyboards.


And even languages spoken by hundreds of millions, such as Hindi, were devoid of IT terms. When these terms were introduced, other difficulties arose. Take, for instance, the commonly used computer term ‘file’. This alone was called faeel, suchika, sanchika or reti by different translators.


Another obstacle has been continuity among staff participating in the translation task. Volunteers would join the venture with great enthusiasm, translate a dozen strings, and make promises to do more, only to move on after realising that “translation is a tedious, thankless, glamourless, revenueless, highly boring job” as Shrivastava puts it.


A rural revolution


Shrivastava believes that Indian-language computing will revolutionise a rural India where English is practically non-existent. States like Kerala and Madhya Pradesh are already introducing ‘e-governance’ projects based in local languages.


As part of Kerala’s campaign to familiarise its citizens with electronic communication, ‘e-centres’ are being set up throughout the state. These will be connected to the Internet and linked through to a central operating centre. The goal is for at least one person in each family in Kerala to become computer-literate.


Meanwhile, Madhya Pradesh is introducing an online initiative called ‘Gramsampark’, meaning ‘village contact’. The website offers information on how the state is governed in local languages and is available to all 51,000 villages in Madhya Pradesh.


Getting the message out


Karunakar believes IndLiunx’s major challenge will be to make sure the work is widely used. This means finding the actual users, reaching them, and finding those who stay away from computers because of language barriers.


“We need to properly package the whole thing in a simple, installable format and easy-to-use interface,” he says.


Some Indian languages, such as Urdu, Kashmiri, Konkani, Manipuri, and Sindhi have yet to be tackled by the open source taskforce.


Looming challenges


There is a lot of work ahead for the IndLinux team. For now, translations of the basic interface into several local languages have been completed. Next, the volunteers need to work on user manuals, ‘help’ files and more.


Many hands make light work, and IndLinux’s current band of volunteers believe there are too few of them involved in this ambitious task. Beyond that, they need support — both financial and moral.


Surprisingly, rather than one of the more important, government-supported languages such as Hindi, the south Indian language Tamil was the first to be localised through open source. It was only then followed by Hindi.


Those working in Indian-language computing suggest the speedy translation of software into Tamil might also be explained, in part, by the work put in by expatriated Tamil-speaking communities settled in places like Malaysia, North America, Singapore, or Sri Lanka.


Om Vikas, head of the Indian government’s Human Centred Computing Division at its Ministry of Communication and Information Technology in New Delhi, says that IndLinux has also spawned Indix2 — a compact disc of Indian-language software solutions — that supports 11 languages. 


Vikas says, the effort’s weak point is that there is no single national level that is pushing it forward and creating standards.


He says creating such a consortium should be a top priority, and that part of its role should be to deploy the translated products immediately in schools under central government administration.


Vikas says implementation efforts are slow, but “satisfactory” for more important languages like Hindi, Tamil and Bengali.


Overall, it’s a tall task. But if India is to maintain its position as one of the IT leaders in the world, it has no choice but to win this battle.












































Language


Numbers of people
who speak it


Hindi


340 million


Bengali


70 million


Telugu 


66 million


Marathi


62 million


Tamil


53 million


Urdu


43 million


Gujarati


40 million


Kannada 


32 million


Malayalam


30 million


Oriya


28 million


Punjabi


23 million


Assamese


13 million

Source: Malayala Manorama Yearbook