Unstructured data management software solution for Arabic-based languages - Scanning, OCR, Machine Translation and Definition Management.
CiyaGate is a powerful software tool for the government and commercial companies who need to scan, OCR, machine translate and perform language independent definition management. It localizes and extracts information from unstructured data sources such as e-mails, web pages, internal documents and image files, creating a quick and reliable gateway to information that previously was hard to manage. CiyaGate is simple and intuitive, but has extremely powerful capabilities that enable you to obtain the right information in a few clicks, through indexing. Potential users can save time by sharing vast amounts of information in the shortest possible time.
Optical character recognition software (OCR) that scans and recognizes paper documents with up to 95% accuracy and up to 98% accuracy for electronic documents.
For government and commercial companies who deal with large quantities of Arabic / Farsi letters, contracts and manuals that require conversion into electronic files, CiyaICR turns paper into organized information, delivering real value to customers. Up to five pages per minute can be converted into electronic files, providing unbeatable speed. With a push of a button, you can efficiently scan, recognize and translate your paper documents and non-text electronic images (such as PDF). Documents can be converted into Word, Notepad, Excel, and other formats with ease.
CiyaTran™ MT is the state of the art, hybrid (linguistic- and statistical-based) Machine Translation software that translates to and from Farsi/Dari and English. With translation speed of up to 1,200 pages per minute, documents can be translated in batch-mode or real-time.
CiyaTran enables organizations to bi-directionally translate mission-critical information found in documents, e-mails, web pages, message boards, presentations, newspapers, magazines, technical papers and books, while preserving the grammatical structure, meaning and concept in text of the source language. CiyaTran automatically supports correction/adjustment of spelling errors, morphological variations and syntax deviations, and detects format, encoding language and domain of the input text.
DIEP allows Government Agencies and Commercial Companies to quickly identify and remove degradations from degraded text image documents using a high speed software and hardware implementation.
Degraded Image Enhancement Processor (DIEP) technology will automatically remove multiple image degradations and will easily keep pace with high-speed Handwritten Optical Character Recognition, Printed Optical Character Recognition and Machine Translation.
MT Engine Components: Sentence Segmentation Utilities Separate sentences or sentence fragments that are independent of the neighboring sentence, sentence fragment or title.
Word Segmentation Utilities: Space is not always present as a word separator in Farsi, Dari, Pashto, Urdu and Arabic.
Morphological Analyzer: Syntactical Analyzer, Pattern and Production Utilities Identify patterns such as numerals, dates, time, titles, enumerations, itemizations, adverb, adjective and verb formation, etc. Production use databases of grammatical models represented mathematically to simplify sentence structures.
Static Regression Utilities: Software for intelligent comparison between a static database of sentences with known correct translation against a dynamic database of the same sentences with translation after each successive change to the MT’s engine or databases.
Sentence Generator (SenGen TM): The Utility creates trillions of different sentences and their correct translations, used for testing bidirectional MT. SenGen can be used in conjunction with the Batch Processor. The count and type of sentences generated by SenGen, is determined by criteria set by the user.
Statistical Analyzer: This utility collects data from the Web, e-Mails and other electronic forms. Determines the language, format, does the necessary format conversions, removes duplicate files, and maintains frequencies.