RSA-tools - Tutorials - template

Contents

  1. Prerequisite
  2. Introduction
  3. Questions
  4. Example of utilization
  5. Interpreting the result
  6. Additional exercises
  7. Bibliography

Prerequisite

The theoretical background required for this tutorial can be found in the RSAT course.

In particular, we recommend to read the following slides before starting this tutorial.

Introduction

In this tutorial, we will get familiar with the concepts of word occurrences (i.e. number of instances of a given oligonucleotide) in DNA sequences.

Exercise

  1. Assuming a 5th order Markovian background model calibrated on all upstream non-coding sequences of the yeast Saccharomyces cerevisiae, how many occurrences of the word GATACA would you expect by chance in a 5kb sequence?

  2. Using the same background model, generate 1,000 random sequences of length L=5000bp and compute the frequency distribution of the word GATACA. Does the observed mean correspond to your expectation?

  3. Which fraction of the sequences contain at least 3 occurrences of the word ?

Tips

  1. By default, the program dna-pattern returns the matching postions of the query patterns in the input sequences, but the options can be changed to obtain a count table, indicating the number of occurrences of a given pattern for each input sequence.

Solution

View solution| Hide solution

Next steps

You can now come back to the tutorial main page and follow the next tutorials.


Last update 15 Jan 2012 - by