Poster of Linux kernelThe best gift for a Linux geek
HOCR2DJVUSED

HOCR2DJVUSED

Section: hocr2djvused manual (1) Updated: 05/24/2010
Local index Up
 

NAME

hocr2djvused - hOCR to djvused script converter  

SYNOPSIS

hocr2djvused [option...]
 

DESCRIPTION

hocr2djvused reads a m[blue]hOCRm[][1] file (as produced by m[blue]OCRopusm[][2] or m[blue]Cuneiformm[][3]) from the standard input and converts it to a djvused script.  

OPTIONS

 

Text segmentation options

-t lines, --details lines

Record location of every line. Don't record locations of particular words or characters.

-t words, --details=words

Record location of every line and every word. Don't record locations of particular characters.

This is the default.

-t chars, --details=chars

Record location of every line, every word and every character.

--word-segmentation=simple

Consider each non-empty sequence of non-whitespace characters a single word.

This is the default, despite being linguistically incorrect.

--word-segmentation=uax29

Use the m[blue]Unicode Text Segmentationm[][4] algorithm to break lines into words.

This options break assumptions of some DjVu tools that words are separated by spaces, and therefore is it not recommended.

 

Other options

--rotation=n

Assume that DjVu pages are rotated by n degrees.

--page-size=widthxheight

Specifies that page size is width pixels × height pixels.

This option is required for hOCR generated by Cuneiform and superfluous otherwise.

--version

Output version information and exit.

-h, --help

Display help and exit.
 

SEE ALSO

ocrodjvu(1), djvused(1)  

AUTHOR

Jakub Wilk <jwilk@jwilk.net>

Author.
 

COPYRIGHT


Copyright © 2008, 2009, 2010 Jakub Wilk
 

NOTES

1.
hOCR
http://docs.google.com/View?docid=dfxcv4vc_67g844kf
2.
OCRopus
http://ocropus.googlecode.com/
3.
Cuneiform
http://launchpad.net/cuneiform-linux
4.
Unicode Text Segmentation
http://unicode.org/reports/tr29/


 

Index

NAME
SYNOPSIS
DESCRIPTION
OPTIONS
Text segmentation options
Other options
SEE ALSO
AUTHOR
COPYRIGHT
NOTES

This document was created by man2html, using the manual pages.
Time: 21:13:52 GMT, April 16, 2011