hocr2djvused reads a
m[blue]hOCRm[][1]
file (as produced by
m[blue]OCRopusm[][2]
or
m[blue]Cuneiformm[][3]) from the standard input and converts it to a
djvused
script.
OPTIONS
Text segmentation options
-t lines, --details lines
-
Record location of every line. Don't record locations of particular words or characters.
-t words, --details=words
-
Record location of every line and every word. Don't record locations of particular characters.
This is the default.
-t chars, --details=chars
-
Record location of every line, every word and every character.
--word-segmentation=simple
-
Consider each non-empty sequence of non-whitespace characters a single word.
This is the default, despite being linguistically incorrect.
--word-segmentation=uax29
-
Use the
m[blue]Unicode Text Segmentationm[][4]
algorithm to break lines into words.
This options break assumptions of some DjVu tools that words are separated by spaces, and therefore is it not recommended.
Other options
--rotation=n
-
Assume that DjVu pages are rotated by
n
degrees.
--page-size=widthxheight
-
Specifies that page size is
width
pixels ×
height
pixels.
This option is required for hOCR generated by Cuneiform and superfluous otherwise.
--version
-
Output version information and exit.
-h, --help
-
Display help and exit.
SEE ALSO
ocrodjvu(1),
djvused(1)
AUTHOR
Jakub Wilk <jwilk@jwilk.net>
-
Author.
COPYRIGHT
Copyright © 2008, 2009, 2010 Jakub Wilk
NOTES
- 1.
-
hOCR
-
http://docs.google.com/View?docid=dfxcv4vc_67g844kf
- 2.
-
OCRopus
-
http://ocropus.googlecode.com/
- 3.
-
Cuneiform
-
http://launchpad.net/cuneiform-linux
- 4.
-
Unicode Text Segmentation
-
http://unicode.org/reports/tr29/
Index
- NAME
-
- SYNOPSIS
-
- DESCRIPTION
-
- OPTIONS
-
- Text segmentation options
-
- Other options
-
- SEE ALSO
-
- AUTHOR
-
- COPYRIGHT
-
- NOTES
-
This document was created by
man2html,
using the manual pages.
Time: 21:13:52 GMT, April 16, 2011