docas[extract]

Rule-based XML data acquisition from output spool files

 

docas[extract] enables companies to participate in e-business without any modification of existing applications. The generation of electronic bills from existing print data, and publication in various media (e-payment platforms, B2B platforms, web services, e-mail, WML) is only one possible application, which can be quickly integrated into an existing infrastructure. The use of docas[extract] enables documentary information to be extracted from existing, proprietary mass print data such as AFP, PCL, etc., and transformed into data-semantic XML. XML provides the basic technology for the portability of information across different platforms, applications and organisations.

Why convert documents to XML?

The main reason for the popularity of XML is its basic technology. XML offers good portability for the dissemination of business information to different platforms and applications. The strength of XML lies in the possibility of exchanging structured data between different organisations. But XML technology is just as important for the generation, archiving, indexing and publication of business documents. The unstructured (and partially structured) data in companies are today kept at various localities, and typically comprise 80 % of all corporate data / information.
Probably the most important reason for converting documents into XML lies in the necessity to publish documents in various media (multi-channel publishing).

Rule-based data extraction

Rule-based data extraction enables the acquisition of XML data from proprietary print data streams. Rule-based data extraction enables print data to be exported to XML without having to modify or re-programme existing systems. A simple, visual definition tool is used to define how the content of a document is to be represented in XML.

The Business Rule Designer

docas[extract] has a WYSIWYG tool, which is used to define how the content of documents is to be represented in XML. The print data are converted to the XML format fully automatically, following these rules. Designer can handle even extremely complex document structures.
Document types are analysed, and assignment rules are formulated, using docas[extract] Designer

Document content can be highlighted with the WYSIWYG editor and linked to the corresponding data fields. During the definition process, the content is extracted and displayed, as in the case of the sample document. Complex conditions such as line breaks, page breaks, optional elements, dynamic text and multi-column tables can be easily declared.

Integration possibility

docas[extract] runs in batch mode or transactionally, and thus be integrated into existing processes. The software development kit (SDK) offers the possibility of embedding the functions in appropriate development environments.

Transforming the document format

docas[extract] converts documents or print batches not only into data semantic XML, but also into other output formats, for multi-channel publishing. For example, electronic invoices can be issued from a print data-stream in XML, and in other standard commercial formats.

Supported operating systems

docas[extract] runs in Windows NT, 2000 and XP, Linux, zLinux, AIX and Sun Solaris.

Two-Pager docas[extract]