Multilingual Communications and Technology

KanjiScan, Version 1.0

A powerful and flexible Japanese OCR

From experience, I can tell you that electronically archiving printed documents is a difficult chore. With Japanese documents the task is even more arduous. You need OCR software that will quickly and accurately recognize poorly or incorrectly scanned Kanji and Kana characters. If you're using a non-Japanese PC, it's difficult to find OCR software that will run without any special add-ons. KanjiScan from Neocor Tech offers powerful Japanese OCR capabilities in a tight, easy-to-use package that runs on Windows 95 or NT. In fact, I was pleasantly surprised at the software's power and ease of setup and use.

Neocor claims KanjiScan can read up to 100,000 characters per hour. Under optimal conditions, this is probably not far off. However, I found that the speed of scanning varied with the quality of the document.

The professionally typeset material I scanned went through fairly quickly and with few errors. As can be expected, newspaper articles were the slowest. With documents printed at 600 dpi on a laser printer (letters, Japanese Web pages and some old vocabulary lists), scanning took a lot longer even though the documents were relatively short. On top of that, there were a number of misinterpreted characters.

Once you've scanned a document, you can begin the OCR process. You can either automatically OCR each block of text or manually select a block to recognize and correct. KanjiScan can, with a high degree of accuracy, point out incorrectly scanned characters. When it comes across one, a list of possible matches pops up. I deliberately smudged a few kanji on some of the pages I scanned and most of the time the correct match was at or near the top of the list.

KanjiScan's OCR dictionary; can also learn characters and compounds. The recognition dictionary builds a list of kanji and kana with each document scanned. I was skeptical about this at first. But the more pages I put through the software, the more accurate the OCR function became. The software allows you to build separate dictionaries for the different types of documents you scan, such as correspondence, technical documents, newspapers and magazines, etc. This allows the program to best recognize the characters for the type of document being scanned. The program offers two other ways to correct scanned text. First, KanjiScan has a powerful Kanji Search System. This allows you to select characters by entering information about their stroke count and radical type. The System then offers you a list from which you can choose the correct character. This is an excellent function if you're familiar with using a character dictionary like Nelson's or Halpern's. Or, you can correct scanned text using the software's built-in Front-end processor (FEP). However, I found the FEP limiting. While it isn't very difficult to use, it doesn't have many of the features I'm used to. I was far more comfortable pulling the scanned text into my favorite Japanese word processor for editing.

Among KanjiScan's Unique Features is the ability to recognize English, vertical and horizontal Japanese text. In practical terms, this means you can have bilingual OCR capabilities in a single package. Another useful feature is the program's image processing capabilities. When a document is scanned, it's stored as an image file. As can be expected, the quality of the image can vary. KanjiScan's image manipulation tools rival those found in many popular graphics packages.

My main problem with KanjiScan doesn't have to do with the program itself, but with Neocor's claims. Neocor says you don't need to know any Japanese to use the product. This is only true up to a point. Simply scanning and saving the results can be done by anyone ass the interface is in English. Editing the scanned material is a different story. The manual contains a detailed explanation of how people who don't know any kanji or kana can correct missing characters using the Kanji Search System. I had a couple of unilingual friends try, but even with instructions they weren't able to do it.

Overall, I found KanjiScan superior to most of the English OCR software I've used. Anyone requiring Japanese OCR capabilities will find KanjiScan's features and flexibility more than a match for any document scanning task. The system may require a little tweaking and practice to get it working the way you want it to, but in the end you'll find that KanjiScan is one program you'll wonder how you ever did without.

- Scott Nesbitt

Click here for Neocor in the News Page

Click here for NeocorTech's Home Page

Ne.o.cor.tex : n. The dorsal region of the cerebral cortex.