pdf2djvu • man page

pdf2djvu (1)

Leading comments

    Title: pdf2djvu
   Author: Jakub Wilk <jwilk@jwilk.net>
Generator: DocBook XSL Stylesheets v1.79.1 <http://docbook.sf.net/>
     Date: 08/07/2017
   Manual: pdf2djvu manual
   Source: pdf2djvu 0.9.6
 Language: English

(The comments found at the beginning of the groff file "man1/pdf2djvu.1".)

NAME

pdf2djvu - creates DjVu files from PDF files

SYNOPSIS

pdf2djvu [{-o | --output} output-djvu-file] [option...] pdf-file...
pdf2djvu {-i | --indirect} index-djvu-file [option...] pdf-file...
pdf2djvu {--version | --help | -h}

DESCRIPTION

This program creates a DjVu file from one or more Portable Document Format files.

OPTIONS

pdf2djvu accepts the following options:

Document type, file names

-o, --output=output-djvu-file

: Generate a bundled multi-page document. Write the file into output-djvu-file instead of standard output.

-i, --indirect=index-djvu-file

: Generate an indirect multi-page document. Use index-djvu-file as the index file name; put the component files into the same directory. The directory must exist and be writable.

--page-id-template=template

Specifies the naming scheme for page identifiers. Consult the lqTEMPLATE LANGUAGErq section for the template language description.

The default template is lqp{page:04*}.djvurq.

For portability reasons, page identifiers:

: * must consist only of lowercase ASCII letters, digits, _, +, - and dot,

: * cannot start with a +, - or a dot,

: * cannot contain two consecutive dots,

: * must end with the .djvu or the .djv extension.

--page-id-prefix=prefix

: Equivalent to lq--page-id-template=prefix{page:04*}.djvurq.

--page-title-template=template

: Specifies the template for page titles. Consult the lqTEMPLATE LANGUAGErq section for the template language description.
The default template is lq{label}rq.

--no-page-titles

: Don't set page titles.

Resolution, page size

-d, --dpi=resolution

: Specifies the desired resolution to resolution dots per inch. The default is 300 dpi. The allowed range is: 72 < resolution < 6000.

--media-box

: Use MediaBox to determine page size. CropBox is used by default.

--page-size=widthxheight

: Specifies the preferred page size to width pixels × height pixels. The actual page size may be altered in order to respect aspect ratio and DjVu limitations on resolution. (This option takes precedence over -d/--dpi.)

--guess-dpi

: Try to guess native resolution by inspecting embedded images. Use with care.

Image quality

--bg-slices=n+...+n, --bg-slices=n,...,n

: Specifies the encoding quality of the IW44 background layer. This option is similar to the -slice option of c44. Consult the c44(1) manual page for details. The default is 72+11+10+10.

--bg-subsample=n

: Specifies the background subsampling ratio. The default is 3. Valid values are integers between 1 and 12, inclusive.

--fg-colors=default

: Try to preserve all the foreground layer colors. This is the default.

--fg-colors=web

: Reduce foreground layer colors to the web palette (216 colors). This option is not recommended.

--fg-colors=n

: Use GraphicsMagick to reduce number of distinct colors in the foreground layer to n. Valid values are integers between 1 and 4080. This option is not recommended.

--fg-colors=black

: Discard any color information from the foreground layer.

--monochrome

: Render pages as monochrome bitmaps. With this option, --bg-... and --fg-... options are not respected.

--loss-level=n

: Specifies the aggressiveness of the lossy compression. The default is 0 (lossless). Valid values are integers between 0 and 200, inclusive. This option is similar to the -losslevel option of cjb2; consult the cjb2(1) manual page for details. This option can be used only if the --monochrome option is also enabled.

--lossy

: Synonym for --loss-level=100.

--anti-alias

: Enable font and vector anti-aliasing. This option is not recommended.

Extraction

--no-metadata

Don't extract the metadata.

By default:

: * The following entries of the document information dictionary are extracted: Title, Author, Subject, Creator, Producer, CreationDate, ModDate. Timestamps are formatted according to m[blue]RFC 3999m[]
[1]
, with date and time components separated by a single space.

: * The XMP metadata is extracted (or created) and updated accordingly.

: Note

If multiple input documents are specified, only metadata of the first one is taken into account.

--verbatim-metadata

: Keep the original metadata intact.

--no-outline

: Don't extract the document outline.

--hyperlinks=border-avis

: Make hyperlink borders always visible.
By default, a hyperlink border is visible only when the mouse is over the hyperlink.

--hyperlinks=#RRGGBB

: Force the specified border color for hyperlinks.

--no-hyperlinks, --hyperlinks=none

: Don't extract hyperlinks.

--no-text

: Don't extract the text.

--words

: Extract the text. Record the location of every word. This is the default.

--lines

: Extract the text. Record the location of every line, rather that every word.

--crop-text

: Extract no text outside the page boundary.

--no-nfkc

: Do not apply m[blue]NFKCm[]
[2]
normalization on the text, except for characters from the m[blue]Alphabetic Presentation Forms blockm[]
[3]
(U+FB00-U+FB4F), which are normalized unconditionally.
The default is to apply NFKC normalization on all characters.

--filter-text=command-line

: Filter the text through the command-line. The provided filter must preserve whitespace, control characters and decimal digits.
This option implies --no-nfkc.

-p, --pages=page-range

: Specifies pages to convert. page-range is a comma-separated list of sub-ranges. Each sub-range is either a single page (e.g. 17) or a contiguous range of pages (e.g. 37-42). Duplicate page numbers are not allowed. Pages are numbered from 1.
The default is to convert all pages.

Performance

-j, --jobs=n

: Use n threads to perform conversion. The default is to use one thread.

-j0, --jobs=0

: Determine automatically how many threads to use to perform conversion.

Verbosity, help

-v, --verbose

: Display more informational messages while converting the file.

-q, --quiet

: Don't display informational messages while converting the file.

--version

: Output version information and exit.

-h, --help

: Display help and exit.

ENVIRONMENT

The following environment variables affects pdf2djvu on Unix systems:

OMP_*

: Details of runtime behavior with respect to parallelism can be controlled by several environment variables. Please refer to the m[blue]OpenMP API specificationm[]
[4]
for details.

TMPDIR

: pdf2djvu makes heavy use of temporary files. It will store them in a directory specified by this variable. The default is /tmp.

TEMPLATE LANGUAGE

Template syntax

The template language is roughly modeled on the m[blue]Python string formatting syntaxm[]

[5]

A template is a piece of text which contains fields, surrounded by curly braces {}. Fields are replaced with appropriately formatted values when the template is evaluated. Moreover, {{ is replaced with a single { and }} is replaced with a single }.

Field syntax

Each field consists of a variable name, optionally followed by a shift, optionally followed by a format specification.

The shift is a signed (i.e. starting with a + or - character) integer.

The format specification consists of a colon, followed by a width specification.

The width specification is a decimal integer defining the minimum field width. If not specified, then the field width will be determined by the content. Preceding the width specification with a zero (0) character enables zero-padding.

The width specification is optionally followed by an asterisk (*) character, which increases the minimum field width to the width of the longest possible content of the variable.

Available variables

dpage

: Page number in the DjVu document.

page, spage

: Page number in the PDF document.

label

: Page label (logical page number) in the PDF document.
This variable is available only for page titles.

IMPLEMENTATION DETAILS

Layer separation algorithm

Unless the --monochrome option is on, pdf2djvu uses the following naive layer separation algorithm:

1. For each page, do the following:

: 1. Rasterize the page into a pixmap, in the usual manner.

2. Rasterize the page into another pixmap, omitting the following page elements:

: * text,

: * 1 bit-per-pixel raster images,

: * vector elements (except fills of large areas).

3. Compare both pixmaps, pixel by pixel:

: 1. If their colors match, classify the pixel as a part of the background layer.

: 2. Otherwise, classify the pixel as a part of the foreground layer.

BUG REPORTS

If you find a bug in pdf2djvu, please report it at m[blue]the issue trackerm[]

[6]

or to m[blue]the mailing listm[]

[7]

AUTHOR

Jakub Wilk <jwilk@jwilk.net>

: Author.

NOTES

1.

RFC 3999

: www.ietf.org/rfc/rfc3339

2.

NFKC

: unicode.org/reports/tr15

3.

Alphabetic Presentation Forms block

: unicode.org/charts/PDF/UFB00.pdf

4.

OpenMP API specification

: www.openmp.org/specifications

5.

Python string formatting syntax

: docs.python.org/library/string.html#format-string-syntax

6.

the issue tracker

: github.com/jwilk/pdf2djvu/issues

7.

the mailing list

: groups.io/g/pdf2djvu

Installed via

pdf2djvu

Man Section

1 • User Commands

extra • Version

08/07/2017

extra • Source

pdf2djvu 0.9.6

extra • Book

pdf2djvu manual

References

c44 (1)

cjb2 (1)

csepdjvu (1)

djvu (1)

djvudigital (1)

pdf2djvu • man page

pdf2djvu • man page

pdf2djvu (1)

Leading comments

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

Document type, file names

Resolution, page size

Image quality

Extraction

Performance

Verbosity, help

ENVIRONMENT

TEMPLATE LANGUAGE

Template syntax

Field syntax

Available variables

IMPLEMENTATION DETAILS

Layer separation algorithm

BUG REPORTS

SEE ALSO

AUTHOR

NOTES

Installed via

Man Section

extra • Version

extra • Source

extra • Book

References