Copyright (c) 2002 Bill C. Riemers This is free documentation; you can redistribute it and/or modify it under the terms of the GNU General Public License as GNU General Public License, either Version 2 of the license, or (at your option) any later version. The license should have published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. The GNU General Public License's references to "object code" and "executables" are to be interprete...
NAMEdjvutoxml, djvuxmlparser - DjVuLibre XML Tools.
SYNOPSISdjvutoxml [options] inputdjvufile [outputxmlfile]
djvuxmlparser [ -o djvufile ] inputxmlfile
DESCRIPTIONThe DjVuLibre XML Tools provide for editing the metadata, hyperlinks and hidden text associated with DjVu files. Unlike djvused(1) the DjVuLibre XML Tools rely on the XML technology and can take advantage of XML editors and verifiers.
DJVUTOXMLProgram djvutoxml creates a XML file outputxmlfile containing a reference to the original DjVu document inputdjvufile as well as tags describing the metadata, hyperlinks, and hidden text associated with the DjVu file.
The following options are supported:
- --page pagenum
- Select a page in a multi-page document. Without this option, djvutoxml outputs the XML corresponding to all pages of the document.
- Specifies the HIDDENTEXT element for each page should be included in the output. If specified without the --with-anno flag then the --without-anno is implied. If none of the --with-text, --without-text, --with-anno, or --without-anno, flags are specified, then the --with-text and --with-anno flags are implied.
- Specifies not to output the HIDDENTEXT element for each page. If specified without the --without-anno flag then the --with-anno flag is implied.
- Specifies the area MAP element for each page should be included in the output. If specified without the --with-text flag then the --without-text flag is implied.
Specifies the area
element for each page should not be included in the output.
If specified without the
flag then the
flag is implied.
Files produced by djvutoxml can then be modified using either a text editor or a XML editor. Program djvuxmlparser parses the XML file inputxmlfile in order to modify the metadata of the corresponding DjVu file.
- -o djvufile
In principle the target DjVu file is the file
referenced by the
element of the XML file.
This option provides the means to override the filename
specified in the
DJVUXML DOCUMENT TYPE DEFINITIONThe document type definition file (DTD)
defines the input and output of the DjVu XML tools.
The DjVuXML-s DTD is a simplification of the HTML DTD:
with a few new attributes added specific to DjVu. Each of the
specified pages of a DjVu document are represented as
elements within the
element of the XML file.
element may contain multiple
elements to specify attributes like page name,
and gamma factor.
element may also contain one
element to specify the hidden text (usually generated with an OCR engine)
within the DjVu page. In addition each
element may reference a single area
element which contains multiple
elements to represent all the hyperlink and highlight areas within
the DjVu document.
PARAM ElementsLegal PARAM elements of a DjVu OBJECT include but are not limited to PAGE for specifying the page-name, GAMMA for specifying the gamma correction factor (normally 2.2), and DPI for specifying the page resolution.
HIDDENTEXT ElementsThe HIDDENTEXT elements consists of nested elements of PAGECOLUMNS, REGION, PARAGRAPH, LINE, and WORD. The most deeply nested element specified, should specify the bounding coordinates of the element in top-down orientation. The body of the most deeply nested element should contain the text. Most DjVu documents use either LINE or WORD as the lowest level element, but any element is legal as the lowest level element. A white space is always added between WORD elements and a line feed is always added between LINE elements. Since languages such as Japanese do not use spaces between words, it is quite common for Asian OCR engines to use WORD as characters instead.
MAP ElementsThe body of the MAP elements consist of AREA elements. In addition to the attributes listed in
have been added to specify border type, border color, border width, and
highlight colors respectively. Legal values for each of these attributes
are listed in the DjVuXML-s DTD.
In addition, the shape
has been added to the legal list of shapes. An oval uses a rectangular