Most of ancient Oriental books which have great importance to study the Orient were written in Chinese characters. The information represented by Chinese characters is usually hard for us to deal with because most of ancient books written in Chinese c...
Most of ancient Oriental books which have great importance to study the Orient were written in Chinese characters. The information represented by Chinese characters is usually hard for us to deal with because most of ancient books written in Chinese characters have great volume and do not have indices for their content. In Korea, history of the Lee Dynasty was written from the first King Tae-jo to King Chul-jong for 472 years. Since it contains more than 60 million Chinese characters, its also becomes a difficult problem to find the locations of a specific sentence or set of characters such as the name of a person or an event.
In the paper, a method to store the whole content of the Lee Dynasty Cronicle (Chosun-Wangjo-Sillok) and to find the locations of an item was considered. Using 3-Byte Chinese character code, the whole file size becomes about 500MB including index file, so only one disk unit, like IBM 3370 disk, is sufficient to store the retrieval system. To retrieve the locations of an item, only 50,000 comparisons are needed in main memory on an average, which would take several minutes.