indexer -C [ -R ] [ -t tag ] [ -u pattern ] [ -s status ] [ -y content-type ] [ configfile ]
indexer -S [ -R ] [ -t tag ] [ -u pattern ] [ -s status ] [ -y content-type ] [ configfile ]
indexer -I [ -R ] [ -t tag ] [ -u pattern ] [ -s status ] [ -y content-type ] [ configfile ]
indexer -h|-?
The behaviour of indexer is controlled mainly via configuration file indexer.conf (5) , which it reads on startup. There is a compiled-in default for configuration file name and location, so you don't need to specify it every time you run indexer , but you can specify alternative configuration file as the last argument.
indexer supports HTML-formatted (text/html MIME type), XML-formated (text/xml MIME type) and plain text (text/plain MIME type) documents. Support for other data types is provided by using external programs, which are called "parsers". Parser should get data of some type from stdin and put text/html or text/plain data to stdout. See indexer.conf(5) for details.
You may run indexer regularly from cron (8) to keep metadata up-to-date.
indexer is also used to manipulate database. It may be used to clear some data from database, to output some statistics and to calculate popolarity ranking.
By default indexer reindex only whose documents that are "expired", e.g. time since their last reindexing is greater than "Period" from indexer.conf (5) file. This option disables the feature, so all documents will be reindexed, irrelevant to their state. To achieve this, indexer just first marks all URLs as "expired". This gives the following side effect: if you start indexer -a and then terminate it (for example, by pressing Ctrl-C ) and start again, all URLs will be considered "expired" and will be reindexed again.
The combination of -e and -n number is seems to be of some value. So, you can use indexer -e -n 100 to reindex just 100 most expired documents.
Subsection control
tag is a server tag that you can arbitrary set in config file indexer.conf (5)
pattern is a SQL LIKE wildcard for URL. In short, underscore ( _ ) means "any symbol", and per cent ( % ) means "any symbols", and the comparison is case insensitive. For example, indexer -u %izhcom.ru% will reindex all documents that URLs contains string "izhcom.ru".
status is a filter on document's HTTP status obtained during last reindexing. For example, -s 0 is a filter for all documents that has not been indexed before. -s 200 is a filter for all documents that was retrieved with "HTTP 200 Ok" status, and -s 301 is a filter for all documents that was retrieved with "HTTP 301 Redirect" status. See HTTP protocol specifications for details on HTTP status codes and their respective meanings.
category is a filter for documents that match specific category. Categories are almost like tags but nested.
content-type is a MIME type for documents with that Content-Type.
You can freely combine any number of -t , -u , -s , -g and -y options. The filters of the same class (tag, pattern, status) are be combined using logical OR, and the filters of different classes will be combined using logical AND. That means, if you type indexer -u %izhcom.ru% -u %udm.net% -t 1 -s 200 the documents-to-index will be those with tag 1 and HTTP status 200, which URLs contains the strings "izhcom.ru" or "udm.net".
Logging options
Misc.
This will erase data previously collected by indexer from the mnoGoSearch databases. You can use options -t , -u and -s described above to select what do you want to delete.
WARNING: Use this option with extreme caution!
This option outputs a brief statistics of how many documents are there in database, their HTTP status, and how many documents are expired. You can use options -t , -u and -s described above to select what documents do you want statistics on.
This option shows you the referrers of URLs. Or, in other words, all hyperlinks from the document. You can use options -t , -u and -s described above to select what documents do you want to show referrers on.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.