+ A

Can AI interpret ancient texts? : Algorithm could help speed up translation of Joseon court records

Feb 20,2017
이미지뷰
Artificial intelligence will be used to speed up translation of the Seungjeongwon Ilgi, one of the world’s most voluminous publications, the Institute for the Translation of Korean Classics confirmed on Tuesday. The collection is a Korean National Treasure and part of Unesco’s Memory of the World Register.
이미지뷰
The Seungjeongwon Ilgi, written between 1623 and 1910, records every single move of Joseon Dynasty’s kings, including what they did, what they said and where they went. [CULTURAL HERITAGE ADMINISTRATION]
Between 1623 and 1910, a royal scrivener in the Joseon court would follow the king around and document his every move - what he did, what he said, where he went, even how he felt.

The detailed records were kept in a collection of books called the Seungjeongwon Ilgi, which roughly translates to the diaries of the royal secretariat. And in true diary form, each entry began with descriptions of the day’s weather, both in the morning and afternoon. There are known to be about 100 different ways that weather was described in the Seungjeongwon Ilgi. Such meticulous attention to the minutiae of daily life explains the collection’s massive volume.

Although the diaries existed from the early years of the Joseon Dynasty (1392-1910), a fire from the 16th-century Japanese invasion known as the Imjin War (1592-98) and a revolt in 1624 destroyed all the records from the beginning up until the reign of Prince Gwanghae (1575-1641). That’s about half the content.

Even with what’s left, the whole collection includes 3,243 books and 242.5 million Chinese characters (royal documents used Chinese characters even after hangul was adopted as a writing system in 1446). Should the earlier part have survived, historians assume the characters would have amounted to 500 million. For comparison, the 24 Histories, the Chinese history books covering the period from 3000 B.C. to the 17th century, contain 40 million characters.

The Seungjeongwon Ilgi is considered the biggest single publication in the world, one of the reasons it made the list of Unesco’s Memory of the World Register. But it also means translation is an arduous task. The records not only contain Chinese characters but are also written in ancient parlance.

In 1994, the Institute for the Translation of Korean Classics began an ambitious project to translate. About 40 experts have been working on it. Still, over the past 23 years, less than 20 percent of the Seungjeongwon Ilgi has been translated. At this rate, it may take another 45 to 50 years to translate the whole thing, and that still just produces a rough draft.

Last week, there were reports that the Ministry of Science, ICT and Future Planning would fund a project to use artificial intelligence to speed up translation of the Seungjeongwon Ilgi. If successful, the whole collection could be translated in 18 years, according to reports. The budget is reportedly around 2 billion won ($1.76 million), and the first tangible result of the AI-led translation effort could come out around December.

A spokesman at the Institute for the Translation of Korean Classics confirmed to the Korea JoongAng Daily on Tuesday that the project will happen. The ministry is expected to make an official announcement through a press release at the end of the month.

Choi Yeong-rok of the institute’s external affairs division said that AI-powered translation of classics has never taken place in Korea. “There is a known precedence in China, but that’s different because in the case of Korean classics, it’s translating Chinese characters into Korean.”

Asked why the Seungjeongwon Ilgi was chosen, Choi said many people were waiting to see the collection be translated into a version that modern Koreans can actually read and understand.

Media reports speculated that neural machine translation technology would be used. The technology is based on an artificially-intelligent neural network that works on whole sentences at once and figures out the best translation using big data. In essence, the system functions like a human brain.

Human translators will input the corpus (the collection of written texts) based on their translations so far of the publication to create the big data, and based on the data, the AI algorithm will translate the remaining parts of the Seungjeongwon Ilgi.

How accurate the AI translation will be remains uncertain. Even in the Joseon Dynasty’s 500-year run, the language changed drastically over time. Lee Myung-hak, the institute’s head, wrote in a column in 2015 that translation of the Seungjeongwon Ilgi was “impossible without the knowledge of the periods’ politics, economy, society and culture.” That is why most experts say AI translation of the publication will be rough and even inaccurate. Human experts will have to proofread intensively several times, which could take up a considerable amount of time.

Nonetheless, there are pundits who say the attempt itself could be meaningful given the Seungjeongwon Ilgi’s high regard among filmmakers, screenwriters and novelists as a treasure trove of inspiration for new films, dramas, novels and other creative materials. Just from the 20 percent translated, hit historical films and dramas like Yi San (2007-8) and Masquerade (2012) have been made.

In an interview, one senior translator who worked on the Seungjeongwon Ilgi compared the book to a dragon, in reference to its potential power. “In order to wake a dragon, the Yeouiju [a magical orb] is necessary,” Ha Seung-hyeon said.

“In this case, the Yeouiju is translation.”

BY KIM HYUNG-EUN [hkim@joongang.co.kr]