NAME

oligo-diff


VERSION

$program_version


DESCRIPTION

Compare frequencies of oligonucleotides between two input sequence files, and return oligos that are significantly enriched in one of the files respective to the other one.


AUTHORS

Jacques.van-Helden\@univ-amu.fr


CATEGORY

util


USAGE

oligo-diff [-i inputfile] [-o outputfile] [-v #] [...]


INPUT FORMAT

The program takes as input a pair of sequence files in fasta format.


OUTPUT FORMAT

The output is a tab-delimted file with one row per oligonucleotide, and one column per statistics. The column content is detailed in the header of the output (for this, the verbosity needs to be at least 1).


STATISTICAL MODEL


SEE ALSO

oligo-analysis

The programs oligo-diffand oligo-analysis serve related purposes: discovering exceptional oligonucleotides. The difference is that oligo-analysis considers a single sequence file, and compares pobserved oligo-frequencies with those expected from a background model (Bernoulli or Markov). This background model is generally estimated from a set of background sequences.

In the situation where one wants to compare a small sequence file (e.g. 50 promoters of co-expressed genes) to a large one (e.g. the 6000 other promoters of the considered organism), oligo-diff should return more or less the same results as oligo-analysis with a background model based on the large file. Slight differences come from the use of the hypergeometric (oligo-diff) vesus binomial (oligo-analysis) statistics.

count-words

oligo-diff calls the program count-words to count oligonucleotide occurrences in the two input sequence files. The program count-words is part of the RSAT suite (it is written in C, and has to be compiled as explained in the RSAT installation guide).


WISH LIST


OPTIONS

-v #

Level of verbosity (detail in the warning messages during execution)

-h

Display full help message

-help

Same as -h

-file1 first_seq_file

First sequence file.

-file2 second_seq_file

Second sequence file.

-l oligo_len

Oligonucleotide length.

-1str

Count oligonucleotides on a single strand only.

Alternative option: -2str

-2str

Sum oligonucleotides on both strands.

More precisely, each pair of reverse complements is counted as a single motif (the count is performed on a single strand, but pairs of reverse complements are merged).

Alternative option: -1str

-noov

Do not accept overlap between successive occurrences of the same word. Only renewing occurrences are counted.

E.g.: TATATATATATA is counted as 2 occurrences of TATATA

Alternative option: -ovlp

-ovlp

Count all occurrences of self-overlapping words.

E.g.: TATATATATATA is counted as 4 occurrences of TATATA

Alternative option: -noov

-o outputfile

If no output file is specified, the standard output is used. This allows to use the command within a pipe.

-lth key value

Lower threshold on some output field.

Supported fields for threshold: occ,sig

-uth key value

Upper threshold on some output field.