You can see a list of all available commands by looking in the $OCROSCRIPTS
(/usr/share/ocropus/scripts/ by default) path.
The 'recognize' script uses tesseract for recognition and sends the html-based hOCR
ouput to stdout. Tesseract is probably the most mature text recognizer within
OCRopus at the moment. Natively, Tesseract doesn't do layout analysis, but
combined with OCRopus, it makes for a pretty good OCR system:
$ ocroscript recognize page.png > page.html
Here is a brief summary of the remaining command line commands available.
You will need to look at the script to see what the command line arguments are:
Simple document image degradation
Convert hOCR output to plain text.
Given a line image, remove marginal noise and fix some other problems.