The simplest way to represent a transcription factor binding site is with a string composed with the 4-letter alphabet of DNA sequences: A, C, G and T.
Unfortunately, transcription factor binding motifs (TFBM) are generally not restricted to one perfectly specified 4-letters string. The simple DNA alphabet representation is thus insufficient to represent partly specified or aspecific residues in the DNA/factor interface.
Some more elaborate representations have been developed to represent partially specified motifs (IUPAC, regular expressions, positin-specific scoring matrices). These representations are supported on RSAT pattern matching programs (dna-pattern, matrix-scan).
|R||A or G||puRines|
|Y||C or T||pYrimidines|
|W||A or T||Weak hydrogen bonding|
|S||G or C||Strong hydrogen bonding|
|M||A or C||aMino group at common position|
|K||G or T||Keto group at common position|
|H||A, C or T||not G|
|B||G, C or T||not A|
|V||G, A, C||not T|
|D||G, A or T||not C|
|N||G, A, C or T||aNy|
Regular expressions are a convenient way to express complex patterns with strings. This formalism supports many syntacic feature, which are out of scope for this tutorial, but a complete description can be found in many source e.g. in Perl textbooks. We will just provide a few examples of useful expressions.
RSAT support patterns described as combinations of IUPAC alphabet and regular expression.
Whichever string-based representation is used, upper and lower case are considered equivalent by RSAT pattern matching and motif discovery algorithms.
However, some programs support a filtering option, allowing to mask either lowercases or uppercases before starting the analysis. This option can be used when a specific meaning is attached to lower- or uppercases. For example, the "Get DNA" tool at the UCSC Genome Browser allows to denote specific sequence types with lower- or upper-cases (e.g. repetitive sequences, genes, non-coding, ...)
You can now come back to the tutorial main page and follow the next tutorials, or directly switch to the following lessons.