Documentation PDF Scanning & OCR Retail Management Web Development Misc. Software

Scanning and OCR Services

Scanning is the process of using hardware to create digital images from paper documents. OCR (Optical Character Recognition) is the process of using software to turn digital images into editable text. Together, they provide the ability to turn your paper documents into re-usable formats, like MS Word files, PDF files, or HTML pages. OCR can also be used to turn image-based PDF files back into editable text.

Topics:    Accuracy    Cost    Volume    Document Exchange

How Accurate is OCR?
There are many factors that contribute to the success of an OCR project, and all of them inevitably affect the cost of the service.

  • The most important factor is the print quality of the documents that you send us. For example, a page that is output from a 600-dpi laser printer on white paper will have nearly perfect recognition compared to a fax-quality image containing blurry or shadowy text.

  • Assuming good print quality, the second major factor is the nature of the text. For instance, regular text (e.g., "The quick brown fox jumped over the lazy dog.") is ideal, and results in perfect recognition. However, text containing special characters, bullet symbols, non-English words, chemical formulas, math symbols, subscript/superscript characters, etc. are much more difficult to OCR and require much more manual labor.

  • The final factors we consider are page layout and binding. The golden rule is: Simpler is Better. A single-column text-only page is much easier to process than a multi-column catalog page that contains text wrapped around images. Likewise, single-sided loose leaf pages are easier to process than double-sided bound volumes.

This is not to say that some pages cannot be OCRd and converted. It simply means that the process can range from simple to very difficult.

What Does It Cost?
We are not able to firm-quote any OCR project until we see the entire document, however, we can usually provide an estimate based on a few sample pages from the document. In most cases, we will ask you to fax over a few "representative" pages. The base cost for an OCR job can range from $0.30/pg to several dollars per page depending on the nature of the document and the final file format desired (i.e. pdf, .doc, etc).

"I Have Ten Zillion Pages to Scan..."
RSA is not a large-volume OCR shop. Our services are aimed at small companies and colleges who have jobs no larger than a few thousand pages. Please contact us to make sure.

How Do I Send Documents to RSA?
For small jobs (under 20 pages), if the pages are fairly clean and contain text only, you can fax the entire job to us. We have a fax machine which provides us with very clean output. Of course, if you have a crappy fax machine, that will be moot since the images will be distorted as you send them. Still, it's worth a try. Fax us a few sample pages and we'll let you know. Otherwise, you will need to send us the documents by mail. Use the mailing address on our
Contacts page. Always notify someone at RSA before sending documents!

Top