mcxarray - Transform array data to MCL matrices

**mcxarray** [options]

**mcxarray**
**[-data** fname (*input data file*)**]**
**[-imx** fname (*input matrix file*)**]**
**[-co** num (*cutoff for output values (required)*)**]**
**[--pearson** (*use Pearson correlation (default)*)**]**
**[--spearman** (*use Spearman rank correlation*)**]**
**[-skipr** <num> (*skip <num> data rows*)**]**
**[-skipc** <num> (*skip <num> data columns*)**]**
**[-o** fname (*output file fname*)**]**
**[-write-tab** <fname> (*write row labels to file*)**]**
**[-l** <num> (*take labels from column <num>*)**]**
**[-digits** <num> (*output precision*)**]**
**[--write-binary** (*write output in binary format*)**]**
**[-t** <int> (*use <int> threads*)**]**
**[-J** <intJ> (*a total of <intJ> jobs are used*)**]**
**[-j** <intj> (*this job has index <intj>*)**]**
**[-start** <int> (*start at column <int> inclusive*)**]**
**[-end** <int> (*end at column <int> EXclusive*)**]**
**[--transpose** (*work with the transposed data matrix*)**]**
**[-tf** spec (*transform result network*)**]**
**[-table-tf** spec (*transform input table before processing*)**]**
**[-n** mode (*normalize input*)**]**
**[--cosine** (*use cosine*)**]**
**[--zero-as-na** (*treat zeroes as missing data*)**]**
**[-write-data** <fname> (*write data to file*)**]**
**[-write-na** <fname> (*write NA matrix to file*)**]**
**[--job-info** (*print index ranges for this job*)**]**
**[--help** (*print this help*)**]**
**[-h** (*print this help*)**]**
**[--version** (*print version information*)**]**

**mcxarray** can either read a flat file containing array data (**-data**)
or a matrix file satisfying the mcl input format (**-imx**). In the
former case it will by default work with the rows as the data vectors. In
the latter case it will by default work with the columns as the data
vectors (note that mcl matrices are presented as a listing of columns).
This can be changed for both using the
**--transpose** option.

The input data may contain missing data in the form of empty columns, NA values (not available/applicable), or NaN values (not a number). The program keeps track of these, and when computing the correlation between two rows or columns ignores all positions where any one of the two has missing data.

2m
2m
**-data** fname (*input data file*)

Specify the data file containing the expression values.
It should be tab-separated.

2m
2m
**-imx** fname (*input matrix file*)

The expression values are read from a file in mcl matrix format.

2m
2m
**--pearson** (*use Pearson correlation (default)*)

2m
2m
**--spearman** (*use Spearman rank correlation*)

Use one of these to specify the correlation measure.

2m
2m
**-skipr** <num> (*skip <num> data rows*)

Skip the first *<num>* data rows.

2m
2m
**-skipc** <num> (*skip <num> data columns*)

Ignore the first *<num>* data columns.

2m
2m
**-l** <num> (*take labels from column <num>*)

Specifies to construct a tab of labels from this data column.
The tab can be written to file using **-write-tab** *fname*.

2m
2m
**-write-tab** <fname> (*write row labels to file*)

Write a tab file. In the simple case where the labels are in the first
data column it is sufficient to issue **-skipc** **1**.
If more data columns need to be skipped one must explicitly specify
the data column to take labels from with **-l** *l*.

2m
2m
**-t** <int> (*use <int> threads*)

2m
2m
**-J** <intJ> (*a total of <intJ> jobs are used*)

2m
2m
**-j** <intj> (*this job has index <intj>*)

Computing all pairwise correlations is time-intensive for large input.
If you have multiple CPUs available consider using
as many threads. Additionally it is possible to
spread the computation over multiple jobs/machines.
Conceptually, each job takes a number of threads from
the total thread pool.
Additionally, the number of threads (as specified by **-t**)
currently *must be the same for all jobs*, as it is used
by each job to infer its own set of tasks.
The following set of options, if given to as many commands,
defines three jobs, each running four threads.

2m
2m
**--job-info** (*print index ranges for this job*)

2m
2m
**-start** <int> (*start at column <int> inclusive*)

2m
2m
**-end** <int> (*end at column <int> EXclusive*)

**--job-info** can be used to list the set of column
ranges to be processed by the job as a result of the command
line options **-t**, **-J**, and **-j**.
If a job has failed, this option can be used to manually
split those ranges into finer chunks, each to be processed
as a new sub-job specified with **-start** and **-end**.
With the latter two options, it is impossible to use
parallelization of any kind
(i.e. any of the **-t**, **-J**, and **-j** options).

2m
2m
**-o** fname (*output file fname*)

Output file name.

2m
2m
**-digits** <num> (*output precision*)

Specify the precision to use in native interchange format.

2m
2m
**--write-binary** (*write output in binary format*)

Write output matrices in native binary format.

2m
2m
**-co** num (*cutoff for output values*)

Output values smaller than *num* are removed (set to zero).

2m
2m
**--transpose** (*work with the transpose*)

Work with the transpose of the input data matrix.

2m
2m
**-write-data** <fname> (*write data to file*)

This writes the data that was read in to file.
If **--spearman** is specified the data will
be rank-transformed.

2m
2m
**-write-na** <fname> (*write NA matrix to file*)

This writes all positions for which no data was found
to file, in native mcl matrix format.

2m
2m
**--zero-as-na** (*treat zeroes as missing data*)

This option can be useful when reading data with the **-imx** option,
for example after it has been loaded from label input by **mcxload**.
An example case is the processing of a large number of probe rankings,
where not all rankings contain all probe names. The rankings can be loaded
using **mcxload** with a tab file containing all probe names.
Probes that are present in the ranking are given a positive ordinal
number reflecting the ranking, and probes that are absent are implicitly
given the value zero. With the present option mcxarray will handle
the correlation computation in a reasonable way.

2m
2m
**--cosine** (*use cosine*)

Use the cosine as correlation measure.

2m
2m
**-n** mode (*normalization mode*)

If *mode* is set to **z** the data will be normalized
based on z-score. No other modes are currently supported.

2m
2m
**-tf** spec (*transform result network*)

2m
2m
**-table-tf** spec (*transform input table before processing*)

The transformation syntax is described in **mcxio(5)**.

2m
2m
**--help** (*print help*)

2m
2m
**-h** (*print help*)

2m
2m
**--version** (*print version information*)

Stijn van Dongen.

**mcl(1)**,
**mclfaq(7)**,
and **mclfamily(7)** for an overview of all the documentation
and the utilities in the mcl family.

Time: 21:23:48 GMT, April 16, 2011