+ A

Make information accessible

Apr 07,2018
On March 29, the assets of 2,249 high-level government officials were released. The government, the Supreme Court and the National Election Commission published the lists on the official gazette, and the National Assembly and the Constitutional Court included the data via press releases.

The Public Service Ethics Act requires high-level officials to register and release their assets. The purpose is to prevent unlawful accumulation of wealth and secure fairness in public duty.

It allows citizens to keep track of whether public servants unlawfully make money while doing government business or they prioritize personal interests over public interests.

On the day of the release, the JoongAng Ilbo provided readers aggregated data from different agencies online. I thoroughly reviewed who was included in the list of officials from the first to the 2,249th and whether there were any irregularities.

But the process was not smooth. The first obstacle was that the data was provided in a in PDF format. The strength of the format is the high security of the document, as it cannot be modified. But this is a merit from the supplier’s perspective. It is just the opposite in user’s point of view.

The document totaled 4,054 pages, and excluding the titles, there were 42,244 rows of actual data. In order to analyze the data, the tables in each document needed to be combined, but the portable document format (PDF) did not allow it. And spreadsheets cannot read PDFs. So, all of the files needed to be converted to comma-separated values (CSV), a format that spreadsheets can read. After the conversion, a problem remained.

The original tables were written in various formats, so they needed to be modified in order for calculations to be made.

Of course, there is a way. A computer program can be created so that the raw data can be cleaned up. That’s how the JoongAng Ilbo worked on the data.

But how many average citizens can do that? In the end, the PDF release of public servants’ assets means that citizens can read what they are provided with, but they cannot tamper with any of the information.

After all, there have been worse cases. I requested the Election Commission for information during the presidential election last year and received PDF files made from scanned documents. It was a picture, not a document. To use this kind of data, the only way was manual data entry.

In the so-called fourth industrial revolution era, it is nonsense to release public data in a PDF format that cannot be read by computers. Let’s give up on favoring the perspective of the supplier, unless the government intends on taking credit for merely releasing data.

JoongAng Ilbo, April 6, Page 30

*The author is a head of the digital contents lab at JoongAng Ilbo.