RSAT - random-seq manual
Generate random DNA sequences according to various probabilistic models (Markov models or independently distributed nucleotides)
The length of each sequence.
Number of sequences
Number of sequences. Most sequence format support multi-sequence files.
Line width. To obtain a single line per sequence, fill this option with 0.
Various sequence formats can be chosen through a pop-up menu.
Random sequences can be generated according to different probabilistic models.
- Equiprobable nucleotides
This is the simplest model : all nucleotides have the same prior probability.
- Independent nucleotides with distinct probabilities
A specific prior probability can be attached to nucleotides (AT and CG are grouped). This probability is constant over the sequence, i.e. each nucleotide is generated independently of the preceding and succeeding nucleotides.
- Markov chains (calibrated on upstream non-coding frequencies)
The random sequence has the same oligonucleotide composition as observed in the non-coding regions located upstream of all genes of the selected organism. This is obtained by a Markov chain process, where nucleotide probabilities vary at each position, depending on the preceding nucleotides.
- Organism The reference organism (oligonucleotide frequencies are pre-calculated for each supported organism, on the basis of the complete set of upstream sequences).
- Oligonucleotide size This determines which expected oligonucleotide calibraiton table has to be used. The markov chain order is this value minus one. For example,
- Calibrating with hexanucleoides (oligonucleotide length = 6) means that the nucleotide at each position depends on the 5 preceding nucleotides. This is this thus a Markov chain of order 5.
- Calibrating on single nucleotides (oligo length = 1) means that each nucleotide is chosen independently off the preceding one. This is thus a Bernouille model (or Markov chain of order 0).