mcxquery - compute simple graph statistics

mcxq is not in actual fact a program. This manual
page documents the behaviour and options of the mcx program when
invoked in mode *q*. The options **-h**, **--apropos**,
**--version**, **-set**, **--nop**, **-progress** *<num>*
are accessible
in all **mcx** modes. They are described
in the **mcx** manual page.

**mcxquery**
**[-imx** <fname> (*specify matrix input*)**]**
**[-o** <fname> (*output file name*)**]**
**[-tab** <fname> (*use tab file*)**]**
**[--node-attr** (*output node degree and weight attributes*)**]**
**[-vary-threshold** <start,end,step,scale> (*analyze graph at similarity cutoffs*)**]**
**[-vary-knn** <start,end,step,scale> (*analyze graph for varying k-NN*)**]**
**[-report-scale** <num> (*edge weight/threshold scaling*)**]**
**[--knn-reduce** (*use reduced matrix*)**]**
**[--test-cycle** (*test whether graph contains cycles*)**]**
**[-test-cycle** <num> (*test cycles, report cyclees*)**]**
**[--vary-correlation** (*analyze graph at correlation cutoffs*)**]**
**[--clcf** (*include clustering coefficient analysis*)**]**
**[-div** <num> (*cluster size separating value*)**]**
**[--dim** (*report native format and dimensions*)**]**
**[-t** <num> (*number of threads to use*)**]**
**[-icl** <fname> (*input clustering*)**]**
**[-tf** spec (*apply tf-spec to input matrix*)**]**
**[-h** (*print synopsis, exit*)**]**
**[--apropos** (*print synopsis, exit*)**]**
**[--version** (*print version, exit*)**]**

The main use of **mcxquery** is to analyze a graph at different similarity
cutoffs. Typically this is done on a graph constructed using a
very permissive threshold. For example, one can create a graph from
array expression data using **mcxarray** with a very low pearson correlation
cutoff such as 0.2 or 0.3. Then **mcxquery** can be used to analyze
the graph at increasingly stringent thresholds of 0.25, 0.30,
0.35 .. 0.95.
Attributes supplied across different thresholds are the number of connected
components, the number of singletons, adn statistics (median, average, iqr) on node degrees and edge
weights.

2m
2m
**-imx** <fname> (*input matrix*)

The file name for input that is in mcl native matrix format.

2m
2m
**-o** <fname> (*output file name*)

Set the name of the file where output should be written to.

2m
2m
**-tab** <fname> (*use tab file*)

This option causes the output to be printed with the labels
found in the tab file.

2m
2m
**--dim** (*report native format and dimensions*)

This will report the matrix format (either interchange or binary)
and the matrix dimensions. For a graph the two reported dimensions
should be equal.

2m
2m
**-vary-threshold** <start,end,step,scale> (*analyze graphs at similarity cutoffs*)

All of *start*, *end*, *step* and *scale* must
be integer numbers. From these a list of threshold is constructed, starting
from *start / scale*, *(start + step) / scale*, *(start + 2 step) /
scale*, and so on until a value larger than or equal to *end / scale* is reached.

2m
2m
**--vary-correlation** (*analyze graphs at correlation cutoffs*)

This instructs **mcxquery** to use a threshold list suitable for use with graphs
in which the edge weight similarities are correlations.
The list starts at 0.2 and ends at 0.95 using increments of 0.05.
If a different start or increment is required it can
be achieved by using the **-vary-threshold** option.
For example, a start of 0.10 and an increment of 0.02 are obtained
by issuing **-vary-threshold** **10,100,2,100**.

2m
2m
**--clcf** (*include clustering coefficient analysis*)

This option causes the global clustering coefficient to be computed
for the analysis of thresholded graphs with **--vary-correlation**
and **-vary-threshold**. For large graphs this may be time-consuming.

2m
2m
**-vary-knn** <start,end,step,scale> (*analyze graphs for varying k-NN*)

2m
2m
**--knn-reduce** (*use reduced matrix*)

This analyses a graph as it is subjected to varying *k*-NN (*k* mutual nearest
neighbours) selection steps. The analyses starts at the *end* argument,
and progresses towards the *start* argument using decrements of size *step*.
By default the reduction is always computed relative to the start matrix,
i.e. the input matrix after **-tf** transformations have optionally been
applied. Specifying **--knn-reduce** causes this to change so that
each new reduction is calculated relative to the reduction
just computed.

For graphs with ties among edge weights it may be useful to use
**-tf** **'#tug()'**. This will add small perturbations to the
edge weights and have the effect of breaking ties.
By default perturbations are computed using the cosine between
the vectors of neighbours of the two nodes incident to an edge.
This can be changed to a random perturbation with
**-tf** **'#rug()'**.

2m
2m
**--test-cycle** (*test whether graph contains cycles*)

2m
2m
**-test-cycle** <num> (*test cycles, report cyclees*)

Test whether the input graph contains cycles. With the second option
nodes that are part of a cycle are output, up to a maximum of *<num>*
nodes. Use *<num>*=**-1** to output all such nodes.

2m
2m
**-report-scale** <num> (*edge weight / threshold scaling*)

The edge weights mean, average, and inter-quartile range,
as well as the different threshold steps are all rescaled
in the reported output to avoid printing of fractional part.
If **-vary-threshold** was supplied then
scaling factor specified in the argument is used.
With **--vary-correlation** a scaling factor of 100
is used. Either can be overridden by using the present option.

2m
2m
**-div** <num> (*cluster size separating value*)

When analyzing graphs at different thresholds with one of the
options above, **mcxquery** reports the percentage of nodes contained
in clusters not exceeding a specified size, by default 3.
This number can be changed using the **-div** option.

2m
2m
**-tf** <tf-spec> (*transform input matrix values*)

Transform the input matrix values according
to the syntax described in **mcxio(5)**.

2m
2m
**-t** <num> (*number of threads to use*)

This has an effect only when using the **-vary-knn** option,
and is only useful on multi-CPU machines.

2m
2m
**--node-attr** (*output node degree and weight attributes*)

Output is in the form of a tab separated file.
The option **-icl** can be used in conjuction.

2m
2m
**-icl** <fname> (*input clustering*)

Output for each node the size of the cluster it is in.
This option can be used in conjunction with **--node-attr**.

**mcxio(5)**,
and **mclfamily(7)** for an overview of all the documentation
and the utilities in the mcl family.

Time: 21:23:48 GMT, April 16, 2011