The theoretical background required for this tutorial can be found in the RSAT course.
In particular, we recommend to read the following slides before starting this tutorial.
In this tutorial, we will get familiar with the concepts of word occurrences (i.e. number of instances of a given oligonucleotide) in DNA sequences.
Assuming a 5th order Markovian background model calibrated on all upstream non-coding sequences of the yeast Saccharomyces cerevisiae, how many occurrences of the word GATACA would you expect by chance in a 5kb sequence?
Using the same background model, generate 1,000 random sequences of length L=5000bp and compute the frequency distribution of the word GATACA. Does the observed mean correspond to your expectation?
Which fraction of the sequences contain at least 3 occurrences of the word ?
You can now come back to the tutorial main page and follow the next tutorials.