Thailand celebrates National Children’s Day on the second Saturday of January. This year, to mark the occasion, Asia Online introduced its Thai-language wikipedia translation. We spoke with company CEO Dion Wiggins and vice president Kirti Vashee about the project.
In 2008, we wrote about Asia Online’s plans to increase the availability of content for Asian communities (see “Asia Online Stakes Portal Success on MT,” Apr08). At the time, we noted the disproportionate distribution of local-language web content to the swelling internet population of Asia. Recognizing that human translators couldn’t close the language gap, Asia Online proposed to use its machine translation (MT) technology to shift the balance. Last Friday, in the run-up to Children’s Day, Thailand’s Prime Minister Abhisit Vejjajiva announced the launch of the complete English-language Wikipedia in Thai. This was presented as a gift of knowledge to the children of Thailand from the government and local business partners, CAT Telecom and Asia Online, all of which contributed funding to the project.
According to Wiggins, the launch event wouldn’t have been out of place at the Consumer Electronics Show. It was staged with lots of TV coverage, videos, lighting effects, and dignitaries including the Prime Minister and Minister of Information and Communications Technology — and rightfully so, since this site immediately comprises a huge chunk of the Thai web. Of course, the event was liberally sprinkled with 100 children, representatives of Thailand’s younger generations that will benefit from this massive translation of content into their native language.
Wiggins told us that within days, his team will have completed the translation of 3.5 million Wikipedia articles into Thai. While this project has been on Asia Online’s agenda for the past three years, developing the core technology, refining the MT engine, training corpora, and growing the company’s commercial presence took precedence. Wiggins claims that the actual translations were carried out in just one week — more than a billion words of machine-translated content passed through the system’s 50 quad-core servers. The infrastructure and tools that Asia Online built for the Thai wikipedia will scale to other content sources and to languages such as Bahasa Indonesia, Hindi, and Malay.
What comes next is crowdsourced proofreading. Asia Online has 10 proofreaders on its payroll who will focus on fixing the most popular articles, but as Vashee explained, “we will add social networking components over time to enable discussions of the content and nurture the creation of more Thai-focused content.” That will allow any of the country’s nearly 18 million internet users to add original information and suggest corrections to the translations. Wiggins said that the proofreading edits will feed back into Asia Online’s EN>TH translation engine, and any articles that have not been reviewed will benefit from periodic re-translations that leverage the crowd’s input. The system will take advantage of MT quality enhancement tools that the company introduced in 2009.
How big a deal is this? This is quite possibly the single largest translation project currently underway (except maybe for the Wycliffe Last Languages Campaign), with over a billion words being translated across a wide range of topics. By way of an apples-to-oranges comparison, SDL’s hosted enterprise servers process 2.5 billion words per year of human and machine translation. What’s most significant here is that translating this much content for a country with Thailand’s population and gross domestic product could never be justified on purely commercial grounds. Asia Online’s machine translation of Wikipedia into Thai paves the way for more local-language content on the increasingly multilingual web.