Degraded Image Enhancement Processing allows Government Agencies and Commercial Companies to quickly identify and remove degradations from degraded text image documents using a high speed software and hardware implementation. Degraded Image Enhancement Processor (DIEP) technology will automatically remove multiple image degradations and will easily keep pace with high-speed Handwritten Optical Character Recognition (HOCR), Printed Optical Character Recognition (OCR) and Machine Translation (MT). With DIEP technology, handwritten documents that are degraded from environmental conditions, the use of primitive media or mishandling can be quickly and easily enhanced.
For example, DIEP will correct photocopied images that may have several degradations including backflash, skewed text and non-text objects and handwriting degradations including non-uniform grayscales, identify and remove non-linear slant and adjust pen thickness for quick processing. DIEP will create specialized enhancement processing for any specific degraded document types, languages, and applications. With the sheer volume of documents that are recovered, it is often difficult to quickly take advantage of the information on documents recovered due to the assortment of document image degradations. In order to perform Optical Character Recognition (HOCR and OCR) and Machine Translation (MT) all degradations must be removed from documents. The process of removing degradations from documents is often a very time-consuming task and lacks the accuracy that is required. In today?s environment, a high-speed solution is required to process the volume of documents. DIEP is a software and hardware (Field Programmable Gate Array - FPGA) solution that can reduce 20 hours of software processing down to only 20 minutes of processing time.
Automatic Retrieval and Intelligent Search Technology using Optical Words is a state of the art application developed by CiyaSoft for government agencies and commercial companies to quickly search inside printed and handwriting Arabic and English images. ARISTOW development started in 2005 to give users the capability to search and retrieve information from millions of English and Arabic printed and handwriting document images stored electronically. ARISTOW enables users to retrieve the closest matches from a set of scanned handwritten documents based on a document image needed for various applications such as scientific notes, historical manuscripts, personal records, web pages, criminal records and other electronic printed and handwritten documents. ARISTOW’s function is to retrieve the closest matches from a set of scanned handwritten documents based on a document image. In each of these applications there is a need for indexing and retrieval based on textual content as well as user-indexed terms. Writer characteristics, textual content, and writer profile influences system indexing and retrieval of the document. Some of the global features that are used to index documents are stroke width, slant, word gaps, as well as local features that describe shapes of characters and words. Image indexing is completed automatically using page analysis, page segmentation, line separation, word segmentation and recognition of characters and words. There is a need for searching a database of handwritten documents not only for textual content but also for visual content such as writer characteristics. As a document management system for handwritten documents, ARISTOW provides several functionalities: image indexing for content-based retrieval, keywords searching in a digital library of handwritten documents, and interactive document analysis. For the purpose of indexing based on writer characteristics, features are automatically extracted at various levels: entire document, word/phrase, and character level. The user can also use interactive graphical tools to assist in obtaining more accurate features, such as image enhancement or providing a transcript of the document. CiyaSoft is the leader in Arabic Handwriting Solutions and has developed SOCTRAT a state of the art, one of a kind application. No other company has developed an Arabic Handwriting Solution of this kind, which works based on real sample collections from 2,000 people from various geographical locations in Africa, the Middle East and Asia. ARISTOW sits on top of SOCRAT to give it the capability of searching. Internet portals and search engines can use this technology to search inside printed and handwritten images.
System for Offline Character Recognition of Text provides a solution for the segmentation of the Arabic words that form an important part of Arabic Handwriting OCR. Government and Commercial companies will now be able to process handwritten Arabic documents, including historical documents, easily and with accuracy. Preprocessing of handwritten document images is important in order to organize the information to make the actual process of recognition simpler.
For example, in a document with mixed machine print and handwritten text, it is preferable to discriminate and separate the two data types before the feature extraction phase. In addition, the handwritten text should be brought down to some normalized, standard, and noise-free representation to make the recognition easier. Arabic is spoken by over 300+ million people and has significance in the culture of over one billion people. With SOCRAT, handwritten Arabic documents, manuscripts and historical documents can now easily be translated.