## RSAT - Help about contingency-stats

This programs takes as input a contingency table, and calculates various
matching statistics between the rows and columns. The description of
these statistics can be found in Brohee and van Helden, 2006.
#### INPUT FORMAT

A contingency table is a N*M table used to compare the contents of two
classifications. Rows represent the clusters of the first classification
(considered as reference), and columns the clusters of the second
classification (query).

Contingency tables can be generated with the program contingency-table,
or with compare-classes (option -matrix QR).

#### OUTPUT FORMAT

A tab-delimited text file with one row per statistics.

#### STATISTICS

**Sn**

Sensitivity. This parameter indicates the fraction of each reference
cluster (row) covered by its best matching query cluster (column).

Sensitivity is calculated at the level of each cell (cell-wise Sn),
of each row (row-wise Sn) and of the whole contingency table
(table-wise Sn).

**Cell-wise sensitivity**

Sn_{i,j} = X_{i,j}/SUM_j(X_{i,j})

**row-wise sensitivity**

Sn_{i.} = MAX_j(Sn_{i,j})

The row-wise sensitivity of a row is the maximal value of
sensitivity for all the cells of this row.

**table-wise sensitivity**

Sn = SUM_i(Sn_{i.})/M

The table-wise sensitivity is the average of the row-wise
sensitivity over all the rows of the contingency table.

**PPV**

Positive Predictive Value. This parameter indicates the fraction of
each query cluster (column) covered by its best matching reference
cluster (row).

PPV is calculated at the level of each cell (cell-wise PPV), of each
column (column-wise PPV) and of the whole contingency table
(table-wise PPV).

**Cell-wise PPV**

PPV_{i,j} = X_{i,j}/SUM_i(X_{i,j})

**column-wise PPV**

PPV_{.j} = MAX_i(PPV_{i,j})

The column-wise PPV of a column is the maximal value of PPV for
all the cells of this column.

**table-wise PPV**

PPV = SUM_j(PPV_{j.})/N

The table-wise PPV is the average of the column-wise PPV over
all the columns of the contingency table.

**Acc.geom**

Geometric accuracy. This reflects the tradeoff between sensitivity
and positive predictive value, by computing the geometric accuracy
between Sn and PPV.
Acc.geom = sqrt(Sn*PPV)

**Sep**
Separation.

The separation is defined, at the level of each cell (cell-wise
separation) as the product between Sn and PPV.

**Cell-wise separation**

sep_{i,j}=Sn_{i,j}*PPV_{i,j}

**Column-wise separation**
Column-wise separation is defined at the level of each column,
as the sum of separation value for all the cells of this column.

sep_{.j} = SUM_i( sep_{i,j})

**Row-wise separation**
Row-wise separation is defined at the level of each row, as the
sum of separation value for all the cells of this row.

sep_{i.} = SUM_j(sep_{i,j})

**Table-wise separation**

Three table-wise statistics are computed for separation.
average column-wise separation

sep_c = AVG_j(sep_{.j})

average row-wise separation

sep_r = AVG_i(sep_{i.})

table-wise separation

sep = sqrt(sep_r*sep_c)

#### REFERENCES

Brohee, S. & van Helden, J. (2006). Evaluation of clustering algorithms
for protein-protein interaction networks. BMC Bioinformatics 7, 488.

#### Return fields

**stats**

table-wise statistics

**rowstats**

row-wise statistics (one line per row of the contingency table)

**colstats**

column-wise statistics (one line per column of the contingency
table)

**tables**

full tables for each statistics (counts, Sn, PPV, separation).

**margins**

marginal statistics besides the tables (requires to return
tables).