This program creates a DjVu file from the Portable Document Format file
pdf-file.
OPTIONS
pdf2djvu
accepts the following options:
Document type, file names
-o, --output=output-djvu-file
-
Generate a bundled multi-page document. Write the file into
output-djvu-file
instead of standard output.
-i, --indirect=index-djvu-file
-
Generate an indirect multi-page document. Use
index-djvu-file
as the index file name; put the component files into the same directory. The directory must exist and be writable.
--pageid-template=template
-
Specifies the naming scheme for page identifiers. Consult the
"TEMPLATE LANGUAGE"
section for the template language description.
The default template is
"p{page:04*}.djvu".
For portability reasons, page identifiers:
-
•
must consist only of lowercase ASCII letters, digits,
_,
+,
-
and dot,
-
•
cannot start with a dot,
-
•
cannot contain two consecutive dots,
-
•
must end with the
.djvu
or the
.djv
extension.
--pageid-prefix=prefix
-
Equivalent to
"--pageid-template=prefix{page:04*}.djvu".
--page-title-template=template
-
Specifies the template for page titles. Consult the
"TEMPLATE LANGUAGE"
section for the template language description.
The default is to set no page titles.
Resolution, page size
-d, --dpi=resolution
-
Specifies the desired resolution to
resolution
dots per inch. The default is 300 dpi. The allowed range is: 72 ≤
resolution
≤ 6000.
--media-box
-
Use
MediaBox
to determine page size.
CropBox
is used by default.
--page-size=widthxheight
-
Specifies the preferred page size to
width
pixels ×
height
pixels. The actual page size may be altered in order to respect aspect ratio and DjVu limitations on resolution. (This option takes precedence over
-d/--dpi.)
--guess-dpi
-
Try to guess native resolution by inspecting embedded images. Use with care.
Image quality
--bg-slices=n+...+n, --bg-slices=n,...,n
-
Specifies the encoding quality of the IW44 background layer. This option is similar to the
-slice
option of
c44. Consult the
c44(1)
manual page for details. The default is
72+11+10+10.
--bg-subsample=n
-
Specifies the background subsampling ratio. The default is 3. Valid values are integers between 1 and 12, inclusive.
--fg-colors=default
-
Try to preserve all the foreground layer colors. This is the default.
--fg-colors=web
-
Reduce foreground layer colors to the web palette (216 colors). This option is not recommended.
--fg-colors=n
-
Use GraphicsMagick to reduce number of distinct colors in the foreground layer to
n. Valid values are integers between 1 and 4080. This option is not recommended.
--fg-colors=black
-
Discard any color information from the foreground layer.
--monochrome
-
Render pages as monochrome bitmaps. With this option,
--bg-...
and
--fg-...
options are not respected.
--loss-level=n
-
Specifies the aggressiveness of the lossy compression. The default is 0 (lossless). Valid values are integers between 0 and 200, inclusive. This option is similar to the
-losslevel
option of
cjb2; consult the
cjb2(1)
manual page for details. This option is respected only along with the
--monochrome
option.
--lossy
-
Synonym for
--loss-level=100.
--anti-alias
-
Enable font and vector anti-aliasing. This option is not recommended.
Extraction
--no-metadata
-
Don't extract the metadata.
By default:
-
•
The following entries of the document information dictionary are extracted:
Title,
Author,
Subject,
Creator,
Producer,
CreationDate,
ModDate. Timestamps are formatted according to
m[blue]RFC 3999m[][1], with date and time components separated by a single space.
The XMP metadata is extracted (or created) and updated accordingly.
--verbatim-metadata
-
Keep the original metadata intact.
--no-outline
-
Don't extract the document outline.
--hyperlinks=border-avis
-
Make hyperlink borders always visible.
By default, a hyperlink border is visible only when the mouse is over the hyperlink.
--hyperlinks=#RRGGBB
-
Force the specified border color for hyperlinks.
--no-hyperlinks, --hyperlinks=none
-
Don't extract hyperlinks.
--no-text
-
Don't extract the text.
--words
-
Extract the text. Record the location of every word. This is the default.
--lines
-
Extract the text. Record the location of every line, rather that every word.
--crop-text
-
Extract no text outside the page boundary.
--no-nfkc
-
Don't
m[blue]NFKCm[][2]-normalize the text.
--filter-text=command-line
-
Filter the text through the
command-line. The provided filter must preserve whitespace, control characters and decimal digits.
This option implies
--no-nfkc.
-p, --pages=page-range
-
Specifies pages to convert.
page-range
is a comma-separated list of sub-ranges. Each sub-range is either a single page (e.g. 17) or a contiguous range of pages (e.g. 37-42). Pages are numbered from 1.
The default is to convert all pages.
Performance
-j, --jobs=n
-
Use
n
threads to perform conversion. The default is to use one thread.
-j0, --jobs=0
-
Determine automatically how many threads to use to perform conversion.
Verbosity, help
-v, --verbose
-
Display more informational messages while converting the file.
-q, --quiet
-
Don't display informational messages while converting the file.
--version
-
Output version information and exit.
-h, --help
-
Display help and exit.
ENVIRONMENT
OMP_*
-
Details of runtime behaviour with respect to parallelism can be controlled by several environment variables. Please refer to the
m[blue]OpenMP API specificationm[][3]
for details.
TEMPLATE LANGUAGE
Template syntax
The template language is roughly modelled on the
m[blue]Python string formatting syntaxm[][4].
A template is a piece of text which contains
fields, surrounded by curly braces
{}. Fields are replaced with appropriately formatted values when the template is evaluated. Moreover,
{{
is replaced with a single
{
and
}}
is replaced with a single
}.
Field syntax
Each field consists of a variable name, optionally followed by a shift, optionally followed by a format specification.
The shift is a signed (i.e. starting with a
+
or
-
character) integer.
The format specification consists of a colon, followed by a width specification.
The width specification is a decimal integer defining the minimum field width. If not specified, then the field width will be determined by the content. Preceding the width specification with a zero (0) character enables zero-padding.
The width specification is optionally followed by an asterisk (*) character, which increases the minimum field width to the width of the longest possible content of the variable.
Available variables
page, spage
-
Page number in the PDF document.
dpage
-
Page number in the DjVu document.
IMPLEMENTATION DETAILS
Layer separation algorithm
Unless the
--monochrome
option is on, pdf2djvu uses the following naïve layer separation algorithm:
-
1.
For each page, do the following:
-
1.
Raster the page into a pixmap, in the usual manner.
-
2.
Raster the page into another pixmap, omitting the following page elements:
-
•
text,
-
•
1 bit-per-pixel raster images,
-
•
vector elements (except fills of large areas).
-
3.
Compare both pixmaps, pixel by pixel:
-
1.
If their colors match, classify the pixel as a part of the background layer.
-
2.
Otherwise, classify the pixel as a part of the foreground layer.
BUG REPORTS
If you find a bug in pdf2djvu, please report it at
m[blue]the issue trackerm[][5].
SEE ALSO
djvu(1),
djvudigital(1),
csepdjvu(1)
AUTHOR
Jakub Wilk <jwilk@jwilk.net>
-
Author.
COPYRIGHT
Copyright © 2007, 2008, 2009, 2010 Jakub Wilk
NOTES
- 1.
-
RFC 3999
-
http://www.ietf.org/rfc/rfc3339
- 2.
-
NFKC
-
http://unicode.org/reports/tr15/
- 3.
-
OpenMP API specification
-
http://openmp.org/wp/openmp-specifications/
- 4.
-
Python string formatting syntax
-
http://docs.python.org/library/string.html#format-string-syntax
- 5.
-
the issue tracker
-
http://code.google.com/p/pdf2djvu/issues/
Index
- NAME
-
- SYNOPSIS
-
- DESCRIPTION
-
- OPTIONS
-
- Document type, file names
-
- Resolution, page size
-
- Image quality
-
- Extraction
-
- Performance
-
- Verbosity, help
-
- ENVIRONMENT
-
- TEMPLATE LANGUAGE
-
- Template syntax
-
- Field syntax
-
- Available variables
-
- IMPLEMENTATION DETAILS
-
- Layer separation algorithm
-
- BUG REPORTS
-
- SEE ALSO
-
- AUTHOR
-
- COPYRIGHT
-
- NOTES
-
This document was created by
man2html,
using the manual pages.
Time: 21:24:37 GMT, April 16, 2011