Parameters for the operation retrieve_seq.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Organism. Words need to be underscore separated (example: Escherichia_coli_K12).A list of query genes.Return sequences for all the genes of the organism if value = 1. Incompatible with query.Prevent overlap with upstream open reading frames (ORF) if value = 1.Inferior limit of the region to retrieve.
Default is organism dependant (example: Saccharomyces cerevisiae = -800).Superior limit of the region to retrieve. Default is '-1'.Type of genome features to load. Supported: CDS, mRNA, tRNA, rRNA. Sequence type. Supported: upstream, downstream, ORF (unspliced open reading frame).Sequence format. Supported: IG (Intelligenetics), WC (wconsensus), raw, FastALine width (0 for whole sequence on one line).Field(s) to be used in the sequence label. Multiple fields can be specified, separated by commas.
Supported: id, name, organism_name, sequence_type, current_from, current_to, ctg, orf_strand, reg_left, reg_right.
Default: name.Separator between the label fields. Default: | (pipe character).No comments if value = 1. Only the identifier and the sequence are returned.
By default, the comment indicates the ORF and upstream sequence coordinates.Use the repeat masked version of the genome if value = 1.
Warning: repeated regions are annotated for some genomes only.Admit imprecise positions if value = 1.Parameters for the operation retrieve_seq_multigenome.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.The input file is a tab-delimited text files with (at least) the two following columns:
1. gene ID or name
Identifier oor synonyms are supported.
2. Organism name
For the organism name, spaces must be replaced by underscore character
(exactly as for retrieve-seq).
If additional columns are included in the input file, they are ignored.Input file on the server.Return sequences for all the genes of the organism if value = 1. Incompatible with query.Prevent overlap with upstream open reading frames (ORF) if value = 1.Inferior limit of the region to retrieve. Default is organism dependant (example: Saccharomyces cerevisiae = -800).Superior limit of the region to retrieve. Default is '-1'.Type of genome features to load. Supported: CDS, mRNA, tRNA, rRNA. Sequence type. Supported: upstream, downstream, ORF (unspliced open reading frame).Sequence format. Supported: IG (Intelligenetics), WC (wconsensus), raw, FastALine width (0 for whole sequence on one line).Field(s) to be used in the sequence label. Multiple fields can be specified, separated by commas.
Supported: id, name, organism_name, sequence_type, current_from, current_to, ctg, orf_strand, reg_left, reg_right.
Default: name.Separator between the label fields. Default: | (pipe character).No comments if value = 1. Only the identifier and the sequence are returned.
By default, the comment indicates the ORF and upstream sequence coordinates.Use the repeat masked version of the genome if value = 1. Attention: repeated regions are annotated for some genomes only.Admit imprecise positions if value = 1.Number of the column containing the gene names/identifiers (default: 1).Number of the column containing the organisms (default: 2).Parameters for the operation retrieve_ensembl_seq.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Organism. Words need to be underscore separated (example: Escherichia_coli_K12).Address of ensembl database server (default is EBI server).Name of EnsEMBL database (alternative to organism).A list of query genes. You need to supply either this parameter or the next one (tmp_infile). Name of the file with list of genes on the server. You need to supply either this parameter or the previous one (query).Return sequences for all the genes of the organism if value = 1. Incompatible with query.Prevent overlap with upstream open reading frames (ORF) if value = 1.Prevent overlap with upstream gene (extreme transcripts limits) if value = 1.Inferior limit of the region to retrieve. Default is organism dependant (example: Saccharomyces cerevisiae = -800).Superior limit of the region to retrieve. Default is '-1'.Type of genome features to load. Supported: Gene, CDS, mRNA, intron, exon. Sequence type. Supported: upstream, downstream.Chromosome name or number (to use with -left and -right).Left limit of sequence to retrieve.Right limit of sequence to retrieve.Strand of sequence to retrieve when using -left and -right. Values: 1, -1Features.Features format. Supported: ft, gftAll coding sequence is replaced by N in the retrieved sequence if value = 1.Use the repeat masked version of the genome if value = 1. Attention: repeated regions are annotated for some genomes only.Get sequences for all transcript of genes if value = 1. Combine with unique_sequences option if you do pattern discovery afterwards.When getting sequences for all transcripts of genes, keep only non-redundant fragments if value = 1.With feattype intron, get only first intron sequence if value = 1.With feattype exon, get only non-coding (part of) exons if value = 1.With feattype UTR, get only 5prime or 3prime UTR (default is all UTRs).A newline character will be inserted in the sequence every ## bases. 0 will prevent newline insertion. This is the default valueGet orthologuous sequences if value = 1.Filter on taxonomic level when collecting orthologs (e.g. Murinae)Filter on homology type when collecting orthologs (e.g. ortholog_one2one)Type of organism name to use in the fasta header (scientific, common or none).
Default is scientific. Common name is only accessible with -ortho.Parameters for the operation purge_sequence.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Sequence to purge. You need to supply either this parameter or the next one (tmp_infile). Name of the file with input sequence on the server. You need to supply either this parameter or the previous one (sequence).Sequence format. Supported: IG (Intelligenetics), WC (wconsensus), raw, FastA.Minimal match length. Default is 40.Number of mismatches allowed. Default is 3.Discard duplications on the direct strand only (1) or on the reverse complement as well (2). Default is 2.Delete repeats instead of masking them if value = 1.Mask (replace by N characters) sequences shorter than the specified length.Parameters for the operation oligo_analysis.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Verbosity.Input sequence. You need to supply either this parameter or the next one (tmp_infile).Name of the file with input sequence on the server. You need to supply either this parameter or the previous one (sequence).Input sequence format. Supported: IG (Intelligenetics), WC (wconsensus), raw, fasta. Default is fasta.Oligomer length.Organism. Words need to be underscore separated (example: Escherichia_coli_K12).Background model: Type of sequences used as background model for estimating expected oligonucleotide frequencies.
Supported: upstream, upstream-noorf, upstream-noorf-rm, intergenic, input.List of statistics to return. Supported:occ, mseq, freq, proba, ratio, zscore, like, pos, rank.No overlapping of oligos allowed if value = 1.
Disable the detection of overlapping matches for self-overlapping patterns (ex TATATA, GATAGA).Oligonucleotide occurrences found on both stands are summed (2) or not (1). Default is 2.Sort oligomers according to overrepresentation if value = 1.Lower threshold on some parameters. Format=list of'parameter value'.Upper threshold on some parameters. Format=list of 'parameter value'.Pseudo-weight. Must be a real value between 0 and 1.Parameters for the operation oligo_diff.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Verbosity.Input test sequence in fasta format. You need to supply either this parameter or the next one (tmp_test_infile).Name of the file with input test sequence on the server. You need to supply either this parameter or the previous one (test).Input control sequence in fasta format. You need to supply either this parameter or the next one (tmp_control_infile).Name of the file with input control sequence on the server. You need to supply either this parameter or the previous one (control).Side of the significance test (values: test, ctrl, both). In practice, the side is converted into a threshold on the ratio test/control occurences.
test: only tests over-representation in the test sequences. This is converted into a lower threshold of 1 for the test/control ratio
both: test over-representation in eitherr the test or the control set
ctrl: only tests over-representation in the control sequences. This is converted into a upper thrshold of 1 for the test/control ratio.Do not purge input sequences before counting oligonucleotide occurences if value = 1.
Input sequences are purged by default, as this is highly recommended since redundant sequence fragments bias the overrepresentation statistics and create false positivesOligomer length.No overlapping of oligos allowed if value = 1.
Disable the detection of overlapping matches for self-overlapping patterns (ex TATATA, GATAGA).Oligonucleotide occurrences found on both stands are summed (2) or not (1). Default is 2.Lower threshold on some parameters. Format=list of'parameter value'.Upper threshold on some parameters. Format=list of 'parameter value'.Parameters for the operation chip_motifs.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Verbosity.Input test peak sequence in fasta format. You need to supply either this parameter or the next one (tmp_test_infile).Name of the file with input test peak sequence on the server. You need to supply either this parameter or the previous one (test).Input control peak sequence in fasta format. You can supply either this parameter or the next one (tmp_control_infile) or none.Name of the file with input control peak sequence on the server. You can supply either this parameter or the previous one (control) or none.Maximal sequence length.
Larger sequences are truncated at the specified length around the sequence center (from -value/2 to value/2)Maximal number of motifs (matrices) to return for pattern discovery algorithms.
Note the distinction between the maximal number of motifs (matrices) and the maximum number of patterns (words, dyads): a motif generally corresponds to mutually overlapping several patterns (dyads, words).Reference motif.
In some cases, we already dispose of a reference motif, for example the motif annotated in some transcription factor database (e.g. RegulonDB, Jaspar, TRANSFAC) for the transcription factor of interest. These annotations may come from low-throughput experiments, and rely on a poor number of sites, but the reference motif may nevertheless be informative, because it is based on several independent studies.
Each discovered motif can be compared to the reference motif, in order to evaluate its correspondence with the binding motif of the factor of interest.Name of motif database.
List of databases of transcription factor binding motifs (e.g. JASPAR, TRANSFAC, RegulonDB, ...) which will be compared to the discovered motifs (task motifs_vs_db). Use supported-motif-databases for availabilityRestrict the analysis to the N peaks at the top of the input sequence file. Some peak calling programs return the peaks sorted by
score. In such case, the -top_peaks option allows to restrict the analysis to the highest scoring peaks. In some cases, the top-scoring peaks might contain a higher density of binding sites,
allowing to detect motifs with a higher significance.
This option can also be convenient for performing quick tests, parameter selection and debugging before running the full analysis of large sequence sets.Minimal oligonucleotide length. Use in combination with the next option (max_length).
If those options are used, the program iterates over the specified range of oligonucleotide lengths.Maximal oligonucleotide length. Use in combination with the previous option (min_length).
If those options are used, the program iterates over the specified range of oligonucleotide lengths.Order of the Markov model used to estimatd expected oligonucleotide frequencies for oligo-analysis and local-word-analysis.
Higher order Markov models are more stringent, lower order are more sensitive, but tend to return a large number of false positives.
Markov models can be specified with either a positive or a negative value. Positive value indicate the length of the prefix in the transition matrix. Negative value indicate the order of the Markov model relative to the oligonucleotide length. For example, the option -markov -2 gives a model of order m=k-2 (thus, an order 5 for heptanucleotides, an order 4 for hexanucleotides).
The optimal Markov order depends on the number of sequences in the test set. Since ChIP-seq data typically contain hundreds to thoursands of peaks, high Markov orders are generally good, because they are stringent and still sensitive enough. In our experience, motifs are well detected with the most stringent Markov order (-markov -2).Minimal value for markov order. Use in combination with the next option (max_markov).
If those options are used, the program iterates over the specified range of markov orders.Maximal value for markov order. Use in combination with the previous option (min_markov).
If those options are used, the program iterates over the specified range of markov orders.No overlapping of oligos allowed if value = 1.
Disable the detection of overlapping matches for self-overlapping patterns (ex TATATA, GATAGA).Class interval for position-analysis. The width of the position classes, in number of bases (default: 20)Oligonucleotide occurrences found on both stands are summed (2) or not (1). Default is 2.Title displayed on top of the graphs.Image format.
All the formats supported by XYgraph can be used.oligos|dyads|positions|local_words|merged_words|meme|chipmunk
Specify the software tool(s) that will be used for motif discovery.
Several algorithms can be specified either by using the option iteratively:
-disco oligos -disco dyads
or by entering a comma-separated list of algorithms:
-disco oligos,dyads
Default motif discovery algorithms
oligos: Run oligo-analysis to detect over-represented oligonucleotides of a given length (k, specified with option -l) in the test set (van Helden et al., 1998). Prior frequencies of oligonucleotides are taken from Markov model of order m (see option -markov) estimated from the test set sequences themselves.
dyads: Run dyad-analysis to detect over-represented dyads, i.e. pairs of short oligonucleotides (monads) spaced by a region of fixed width but variable content (van Helden et al., 2000). Spaced motifs are typical of certain classes of transcription factors forming homo- or heterodimers. By default, chip-seq-analysis analyzes pairs of trinucleotides with any spacing between 0 and 20. The expected frequency of each dyad is estimated as the product of its monad frequencies in the input sequences (option -bg monads of dyad-analysis).
positions: Run position-analysis to detect oligonucleotides showing a positional bias, i.e. have a non-homogeneous distribution in the peak sequence set.
This method was initially developed to analyze termination and poly-adenylation signals in downstream sequences (van Helden et al., 2001), and it turns out to be very efficient for detecting motifs centred on the ChIP-seq peaks. For ChIP-seq analysis, the reference position is the center of each sequence.
Note that chip-seq-analysis also uses position-analysis for the task profiles, in order to detect compositional biases (residues, dinucleotides) in the test sequence set.
local_words: Run local-word-analysis to detect locally over-represented oligonucleotides and dyads.
The program local-word-analysis (Matthieu Defrance,unpublished) tests the over-representation of each possible word (oligo, dyad) in positional windows in the input sequence set.
Two types of background models are supported: (i) Markov model of order m estimated locally (within the window under consideration; (ii) the frequency observed for a word in the whole sequence set is used as estimator of the prior probability of this word in the window.
After our first trials, this program gives excellent results in ChIP-seq datasets, because its senstivitity increases with large number of sequences (several hundreds/thousands), and its background model is more stringent than for programs computing the global over-representation (oligo-analysis, dyad-analysis).
merged_words: Extract a position-specific scoring matrix (using matrix-from-patterns) from all the words discovered by the selected string-based motif disovery algorithms (oigos,dyads,positions and/or local_words).
Enter the source of the fasta sequence file.
Supported source: galaxy
When the sequence file comes from Galaxy, peak coordinates embedded in the fasta headers are extracted and used to convert predicted site coordinates (relative to peak center) to genomic coordinates (in the form of a bed file), which can then be uploaded to the UCSC genome browser as an annotation track.
This option is incompatible with -coord.
Specify a subset of tasks to be executed.
By default, the program runs all necessary tasks. However, in some cases, it can be useful to select one or several tasks to be executed separately.
Beware: task selection requires expertise, because most tasks depends on the prior execution of some other tasks in the workflow. Selecting tasks before their prerequisite tasks have been completed will provoke fatal errors.
Available Tasks:
all (default): Run all supported tasks.
purge: Purge input sequences (test set and, if specified, control set) to mask redundant fragments before applying pattern discovery algorithms. Sequence purging is necessary because redundant fragments would violate the hypothesis of independence underlying the binomial significance test, resulting in a large number of false positive patterns.
seqlen: Compute sequence lengths and their distribution. Sequence lengths are useful for the negative control (selection of random genome fragments). Sequence length distribution is informative to get an idea about the variability of peak lengths.
composition: Compute compositional profiles, i.e. distributions of residues and dinucleotide frequencies per position (using position-analysis).
Residue profiles may reveal composition biases in the neighborhood of the peak sequences. Dinucleotide profiles can reveal (for example) an enrichment in CpG island.
Note that peak-motifs also runs position-analysis with larger oligonucleotide length (see option -l) to detect motifs on the basis of positionally biased oligonucleotides (see task positions).
ref_motifs: This task combines various operations.
Formating of the reference motif
Perform various format conversion for the reference motif (compute parameters, consensus, logo).
Motif enrichment
Generate an enriched motif by scanning the peak sequence set with the reference motif.
Motif comparison
Compare all discovered motifs with the reference motif.
disco: Run the motif discovery algorithms. See option -disco for the selection of motif discovery algorithm(s).
merge_words: Merge the words (oligos or dyads) discovered by the different string-based motif discovery algorithms.
The table of merged words has one row per word (oligo or dyad) and one column per motif discovery program. This table is convenient to analyze the consistency between the words detected by different approaches, e.g. show that a word is both over-represented (oligo-analysis, dyad-analysis) and positionally biased (position-analysis, local-words). A heatmap is also exported to provide a graphical representation of the significance of each word (row) for each algorthm (column).
The merged words can optionally be used as seeds for extracting position-specific scoring matrices from the sequences, using the program matrix-from-patterns (see option -disco merged_words).
motif_compa: Motifs are compared in three ways.
Discovered versus discovered (task cluster_motifs)
Perform pairwise comparisons between all motifs (matrices) discovered by the different algorithms, to assess their consistency.
motifs_vs_ref:
Compare each discovered motif to the reference motif.
motifs_vs_db:
Compare each discovered motif to a database of known motifs (e.g. Jaspar, TRANSFAC, RegulonDB, UniProbe, ...)
timelog: Generate a log file summarizing the time spent in the different tasks.
synthesis: Generate the HTML file providing a synthesis of the results and pointing towards the individual result files.
clean_seq: Delete the purged sequence files after the analysis, in order to save space. This task is executed only when it is called explicitly. It is not part of the tasks running with the option "-task all".Parameters for the operation dyad_analysis.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Verbosity.Input sequence. You need to supply either this parameter or the next one (tmp_infile).Name of the file with input sequence on the server.
You need to supply either this parameter or the previous one (sequence).Input sequence format. Supported: IG (Intelligenetics), WC (wconsensus), raw, fasta. Default is fasta.Dyad length.Spacing between elements of the dyads.Organism. Words need to be underscore separated (example: Escherichia_coli_K12).Background model: Type of sequences used as background model for estimating expected oligonucleotide frequencies.
Supported: upstream, upstreamL, upstream-noorf, intergenic, input.List of statistics to return. Supported: occ, mseq, freq, proba, ratio, zscore, like, pos, rank.dyad_type (dr | ir | rep | any)
In order to fasten execution, the program can be asked to restrict its analysis to symmetric dyads.
Four types are accepted:
dr - direct repeats: the second element is the same as the first one;
ir - inverted repeats: the second element is the revers complement of the first one;
rep - repeats: direct and inverted repeats are evaluated.
any - (default)
When selecting the option any, the analysis is performed on all non-symmetric dyads as well.No overlapping of dyads allowed if value = 1.
Disable the detection of overlapping matches for self-overlapping patterns (ex TATATA, GATAGA).Dyad occurrences found on both stands are summed (2) or not (1). Default is 2.Sort dyads according to overrepresentation if value = 1.Detect under-represented instead of over-represented dyads (left tail significance test) if value = 1.Detect under-represented and over-represented dyads (two-fail significance test) if value = 1.Report also dyads with zero occurrences (provided they fit the other thresholds) if value = 1.
By default, the program reports only patterns present in the sequence.
If the left tail or two-tail test is applied, patterns with zero occurrences are automatically taken into account.
In some other cases, one would also like to detect patterns absent from the sequence.
This is the function of the option -zeroocc.Lower threshold on some parameters. Format=list of 'parameter value'.Upper threshold on some parameter. Format=list of 'parameter value'.Parameters for the operation position_analysis.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Verbosity.Input sequence. You need to supply either this parameter or the next one (tmp_infile).Name of the file with input sequence on the server. You need to supply either this parameter or the previous one (sequence).Input sequence format. Supported: IG (Intelligenetics), WC (wconsensus), raw, fasta. Default is fasta.Oligomer length.Sequence type (values: dna, any).Stop after a given number of sequences (for quick testing).Mask lower or uppercases by replacing the selected case by N (values: lower, upper).No overlapping of oligos allowed if value = 1.
Disable the detection of overlapping matches for self-overlapping patterns (ex TATATA, GATAGA).Oligonucleotide occurrences found on both stands are summed (2) or not (1). Default is 2.Class interval. The width of the position classes, in number of bases (default: 20)Reference for computing positions. The value should be chosen according to the sequence type (start, for downstream sequences; end, for promoter sequences; center, for ChIP-seq peaks).Add an offset to site position (positive or negative integer). Allows to select an arbitrary position as origin.group reverse complement pairs if value = 1.Sort oligomers according to the bias in distribution profile if value = 1.Fields to return. Several fields can be entered, comma separated (values: distrib, for occurences found in each position class; exp, for expected occurences in each class; graph, for a graph file per oligo profile; chi, for chi-square value; rank).Lower threshold on chi2.Lower threshold on significance.Lower threshold on the number of occurences.Upper threshold on rank.Maximum number of graphs to export.A selection of patterns you want the analysis to be restricted to. Newline separated. A score can be associated to each pattern with the option score_columnName of the file with input patterns on the server. You need to supply either this parameter or the previous one (pattern).The column containing a score value for each pattern supplied with options pattern or tmp_pattern_infile. Only valid in combination with one of these options.Minimal position to take into account for the chi2 computation. This value must be a multiple of the class interval.Maximal position to take into account for the chi2 computation. This value must be a multiple of the class interval.Do not check the applicability condition on the chi2 if set to 1.Do not discard oligos which do not fit the condition of applicability. Instead, mark them by including the chi2 value between curly brackets if set to 1.Image format passed to XYgraph.Title for the index table and position profile plots.Parameters for the operation pattern_assembly.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Input dataInput file on the server (workflow usage)Verbosity.Score column. Column of the input file that contains the scores.
If not specified, patterns are incorporated according to their order in the input file.
However, if the input file has been generated with oligo-analysis or dyad-analysis with a verbosity >= 1,
pattern-assembly detects the occ_sig column in the file header and uses this column as score column.Strands for the assembly: 1 for single-strand; 2 for two-strand assembly. Default is 2.Maximum flanking segment size. Default is 1.Maximum allowed substitutions. Default is 0.Maximum number of assemblies. Default is 0 (no limit).Maximum assembly size, i.e. number of patterns per assembly. Default is 50.Maximum number of allowed patterns. If the number of patterns exceeds this value, the program issues a fatal error.
Default is 0 (no limit). Not compatible with toppat option.Maximum number of patterns to analyze.
If the number of patterns in the input exceeds this value, the assembly is restricted with the top patterns only.
Default: 100. Not compatible with maxpat option.Parameters for the operation dna_pattern.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Input sequence. You need to supply either this parameter or the next one (tmp_infile).Name of the file with input sequence on the server. You need to supply either this parameter or the previous one (sequence).Input sequence format. Supported: IG (Intelligenetics), WC (wconsensus), raw, fasta. Default is fasta.Number of substitutions allowed.Pattern to match. Use this option or the 'pattern_file' optionFile with patterns to match. Use this option or the 'pattern' optionFile located on the server with patterns to match (workflow usage).Pattern identifier.Origin for the calculation of positions (0 for end of sequence).No overlapping of oligos allowed if value = 1.
Disable the detection of overlapping matches for self-overlapping patterns (ex TATATA, GATAGA).Score column. Column of the pattern file which contains the scoreOligonucleotide occurrences found on both stands are summed (2) or not (1). Default is 2.Sort oligomers according to overrepresentation if value = 1.Threshold on match count.List of fields to return. Multiple fields can be entered separated by commas.
Supported fields: colsum,counts,ct,limits,profiles,rank,rowsum,scores,sites,stats,table,total.
sites: return match positions (default)
limits: return start and end positions for each input sequence
counts: return the count of matches per sequence
rank: return the rank of the sequence (this is especially useful in combination with the option -sort)
score: return a score per sequence, computed by summing the scores of the matching patterns
ct: same as '-return counts', except that it returns the sum of matches in all the files of the sequence file list, instead of the count within each separate file
table: return the count of pattern matches per sequence in the form of a table. (one line per sequence, one column per pattern)
colsum: (together with -return table) prints an extra column with the total occurrences per sequence
rowsum: (together with -return table) prints an extra row with total occurrences per pattern
total: (together with -return table) prints an extra column with the total occurrences per sequence and
an extra row with total occurrences per pattern. (amounts to combine -colsum and -rowsum)
stats: return matching statistics
profiles: return matching profiles with sliding windows.Parameters for the operation convert_features.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Input dataInput file on the server (workflow usage)Input format. Supported: dnapat,ft,gft,gff3,gff.Output format. Supported: dnapat,ft,gft,gff3,gff.Bedfile with absolute coordinate of the sequence fragment.Parameters for the operation feature_map.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.a list of features (ORFs, regulatory sites). Each feature is represented by a single line, which should provide the following information:
Input file columns:
1. map label (eg gene name)
2. feature type
3. feature identifier (ex: GATAbox, Abf1_site)
4. strand (D for Direct, R for Reverse),
5. feature start position
6. feature end position
7. (optional) description
8. (optional) score
The standard input format assumes that these topics are provided in this order, separated by tabs. Start and end positions can be positive or negative.Name of the file with input features on the server (workflow usage). You need to supply either this parameter or the previous one (features).Input sequence(s).Reference sequence file on the server (workflow usage)Format of reference sequence file on the server.Output image format. Supported: png,jpg,ps,gif (default = jpg)Lower limit of the positions represented on the graph.Upper limit of the positions represented on the graph.Generic Title for the feature map.Define the info to display for each feature. Valid keys are: id, strand, descr (feature descritption), pos (feature start and end positions).
Several keys can be entered separated by commas without space. ex: -label pos,id. (default = id).Associates a graphical symbol (i.e. rectangle, circle, buterfly, ...) to each feature. This is convenient to distinguish the features on black and white printings.
Mutually exclusive with the -dot option.A color dot is associated to each feature. This allows to distinguish overlapping structures on a color screen. Mutually exclusive with the -symbol option.map length (in pixels).Default is 600.Length refers to either height (for vertical maps) of width (for horizontal maps).Map thickness.Thickness refers to either width (for vertical maps) or height (horizintal maps). This parameter allows to change the thickness allocated to each map.
This is useful when labels are too large. Default is 150.Map spacing.The size of the border between maps (in pixel).All coordinates are recalculated relative to this origin.This allows to display all coordinates with respect to the ORF start or transcription start site.Draws a legend on the graph, showing the symbol associated to each distinct feature.Draws a scale bar on the left of the graph.Step between annotations of the scale bar. If not specified, a reasonable step is calculated on basis of the scale bar range.Each feature is displayed with a thickness proportional to its score. Only positive scores are represented.(only valid when -scorethick is active) Maximal allowed score value. Higher score values are clipped for the drawing.(only valid when -scorethick is active) Minimal allowed score value. Features with smaller score are not displayed.Max feature thicknessMin feature thicknessHTML map. An HTML document is automatically generated, which includes the feature map GIF file as an HTML map.
In other words, this document displays a figure with sensitive areas. Each time the mouse is positioned above a feature,
information about this particuliar feature is displayed at the bottom of the browser window.Monochrome palette (for printing on black/white printer).Orientation of the map. Valid values are "horiz" for an horizontal map (default) and "vertic" for a vertical map. Only display the features whose ID is in the provided id_list.
The id_list contains one or several IDs, separated by commas. IDs may be embraced in single quotes to allow multiple words within the IDs.
Commas and single quotes are not allowed within an ID.
Example: -select 'gataag','gattag' only displays features identified by gataag or gattag.Parameters for the operation footprint_discovery.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Level of detail for commentsA list of genes (one by line).
Name of the file with a list of genes on the server. You need to supply either this parameter or the previous one (genes).Automatically analyze all the genes of a query genome, and store each result in a separate folder (the folder name is defined automatically).Maximum number of genes to analyze.Prefix for the output files. If the prefix is not specified, the program can guess a default prefix, but this is working only if there is a single query gene or query file.A list of genes (referenced array).
Search footprints for each query gene separately. The results arestored in a separate folder for each gene. The folder name is defined automatically.Query organism, to which the query genes belong.Reference taxon, in which orthologous genes have to be collected.Generate an HTML index with links to the result files.
This option is used for the web interface, but can also be convenient to index results, especially when several genes or taxa are analyzed (options -genes, -all_genes, -all_taxa).Lower threshold on some parameters. Format: parameter value.Upper threshold on some parameters. Format: parameter value.Return fields for dyad-analysis. See dyad_analysis for a listConvert assembled patterns into position-specific scoring matrices (PSSM).
Caution ! This conversion can take time if the sequence set is large and if there are many assemblies.Allow the user to choose among alternative background model.
- taxfreq
Taxon-wide background model, computed by counting dyad frequencies in all the promoters of all the genes of the reference taxon.
- monads
Expected dyad frequencies are the product of monad frequencies observed in the input sequences.Accept all dyads, even if they are not found in the promoter of the query gene, in the query organism.Infer operons in order to retrieve the promoters of the predicted operon leader genes rather than those located immediately upstream of the orthologs.
This method uses a threshold on the intergenic distance.Specify here the intergenic distance threshold in base pairs.
Pair of adjacent genes with intergenic distance equal or less than this value are predicted to be within operon. (default : 55).Parameters for the operation get_orthologs.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Organism. Words need to be underscore separated (example: Escherichia_coli_K12).Reference taxon. Orthologs are returned for each supported organism belonging to the reference taxon.List of gene(s) for which you want to get orthologs.Get orthologs for all the genes of the query organism if value = 1. This option is particularly convenient to extract phylogenetic profiles.Disable the pre-filtering with grep if value = 1.
This pre-filtering accelerates the selection of hits, but some grep versions do not support the -E option.
If this is the case on your operating system, you can still obtain the correct results by inactivating the grep filter.Output field(s).
By default, the output is a two-column file indicating the ID of the gene identified as similar to the query gene, and the name of the reference organism.
The option -return can be used to specify additional output fields.
Supported fields:
- ref_id: ID of the reference (target) gene
- ref_organism: Name of the reference (target) organism
- query_id: ID of the query gene
- query_organism: Name of the query organism
- ident: Percent of identity (a number between 0 and 100)
- ali_len: Alignment lengths (in residues)
- mismat: Number of mismatches
- gap_open: Number of gap openings
- e_value: E-value (expected number of false positives)
- bit_sc: Bit score
- rank: Rank
- s_rank: Source rank (rank of the hit for the query organism).
Several output fields can be entered separated by commas.Lower threshold for dyad-analysis. Format: list of 'parameter value'.Upper threshold for dyad-analysis. Format: list of 'parameter value'.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Organism. Words need to be underscore separated (example: Escherichia_coli_K12).List of gene(s) for which you want to infer the operon.Name of the file with query genes on the server. You need to supply either this parameter or the previous one (query).Infer operons for all the genes of the query organism if value = 1.Distance threshold.Min number of genes to report the operon.List of fields to return.
Supported fields: leader,trailer,operon,query,q_info,up_info,down_info
- leader: Predicted operon leader.
- trailer: Predicted operon trailer.
- operon: Full composition of the operon. The names of member genes are separated by a semi-column ";".
- q_info: Detailed info on the query gene(s).
- up_info: Detailed info on the upstream gene.
- down_info: Detailed info on the downstream gene.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Organism. Words need to be underscore separated (example: Escherichia_coli_K12).List of gene(s) for which you want info on or list of keywords to search for (can be regular expressions).Full match only (no substring matching) if value = 1.Do not print the query at the begining of each line if value = 1.Match query against the description, too, not just against gene ID and name if value = 1.Feature type (CDS, mRNA, tRNA, rRNA, scRNA).Get info for all genes if value = 1.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Input fileclass interval. If not specified, takes the value (max - min)/20column to which apply the program. This option can be used iteratively.numbers strictly smaller than this value are not taken into accountnumbers strictly higher than this value are not taken into accountInferior limit for the classes to display. Values lower than this limit are however taken into account in the calculation of statistics (avg, variance, ...)
and of class frequencies (In contrast with the -min option).
Superior limit for the classes to display. Values higher than this limit are however taken into account in the calculation of statistics (avg, variance, ...)
and of class frequencies (In contrast with the -min option).
Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.The input file should contain columns with numeric data. Each line contains info about one point of the graph.
By default, the first column is considered to contain X data, and the second column Y data. X and Y columns can be changed with -xcol and -ycol options.
Columns should be separated by tabs.
Supported : png,pdf,jpg,eps,giffirst graph titlesecond graph titlepoints are jointed by linesUse the content of the first line from input file as legend for Y data.
First line of the data file contains a column header. If option -legend is active, this header is used as legend, else it is ignored.
first x legendsecond x legendfirst y legendsecond y legendmaximal value represented on X axis.maximal value represented on Y axis.minimal value represented on X axis.minimal value represented on Y axis.Y data are displayed on a logarithmic scale. If the next argument is a number, it provides the log base. Default log base is 10.X data are displayed on a logarithmic scale. If the next argument is a number, it provides the log base. Default log base is 10.Column containing data for the X axis.
A zero value indicates that there is no column with X va lues. In this case, X values are ordinal.Column containing data for the Y axis.
Several columns can be specified by -ycol #,#,#. A range of columns can be specified by -ycol #-#. They have to be separated by commas without spaces.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.The input fileColumn of the input ile containing the score value (default 1).Column of the input file containing the status label (default 2).This option allows to use different labels as synonyms for the pre-defined status: pos and neg.
It can be useful to rename these labels, for compatibility with other programs.
For example, it your input file contains annotations of "site" and "non-site", you can use it directly as input with the options.
-status site pos -status non-site neg indicates that the label "site" has to be understood as positive, and "non-site" as negative.Total number of elements in the universe (neg + pos).
This option allows to manually specify the total number of elements, in case the input file would not contain the complete data set.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.The text input file that will be converted in htmlChunk size (when there are many rows, the program splits the table into several HTML tables, to reduce the waiting time on the browser.make the output HTML table not sortable.Use fixed or variable fonts in html. Supported : variable,fixedReturn type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.The psi XML input file that will be converted in a tab fileA list of evidence channels to filter separated by a commaA list of interactor_type to filter (separated by comas) : protein, 'small molecule'Upper threshold on the valueLower threshold on the valueParameters for the operation supported_organisms.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Return fields. Supported: ID,name,data,last_update,taxonomy,up_from,up_to,genome,seq_format,source.Output format. Supported: html_list, html_table, array, text, keys, names, sizes, full, tree, html_tree.Root taxon.Only returns organisms from a user-selected source. Example: supported-organisms -source ensemblTraversal depth for the taxononmic tree. If several organisms are supported in a max-dept taxon, only one is reported.Parameters for the operation supported_organisms.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Return fields. Supported: name,format,file,descr,version,url.Parameters for the operation convert_seq.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Sequence to convert.Name of the file with input sequence on the server. You need to supply either this parameter or the previous one (sequence).Format of input sequence (embl, fasta, filelist, gcg, genbank, ig, maf, multi, ncbi, raw, tab, wc, wconsensus).Format of output sequence (fasta, filelist, ft, ig, multi, raw, tab, wc, wconsensus).Parameters for the operation compare_classes.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.A tab-delimited text file containing the description of reference classesA tab-delimited text file containing the description of query classesList of fields to return separated by a comma. Supported: dotprod, entropy, freq, jac_sim, members, occ, proba, rankSpecify a column of the input file containing a score associated to each memberThis file will be used as both reference and query.This is equivalent to -q input_file -r input_file.Supported fields: E(QR), E_val, F(!Q!R), F(Q!R), F(Q), F(QR), F(R!Q), F(R), H(Q), H(Q,R), H(Q|R), H(R), H(R|Q),
I(Q,R), IC, P(QR), P(Q|R), P(R|Q), P_val, Q, QR, QvR, R, U(Q|R), U(R|Q), dH(Q,R), dotprod, jac_sim, rank, sigE(QR),
E_val, F(!Q!R), F(Q!R), F(Q), F(QR), F(R!Q), F(R), H(Q), H(Q,R), H(Q|R), H(R), H(R|Q), I(Q,R), IC, P(QR), P(Q|R),
P(R|Q), P_val, Q, QR, QvR, R, U(Q|R), U(R|Q), dH(Q,R), dotprod, dp_bits, jac_sim, log2_dp, names, prodrts, sig, sqrt_dp.
Separate with semicolumn (":") as there are fields containing commas.Upper threshold value for a supported field. There must be as many thresholds values as thresholds fields in the same order as the list of threshold fields were given. Supported fields: same fields as upper_threshold_field.Lower threshold value for a given field.Population size. If not specified, the population size is estimated as the number of distinct elemenst in the whole set of reference classes.Sort on the basis of the specified key.Prevent to compare each class with itself (when the reference and query files contain the same classes)(only valid if query file and reference file are the same) Do not perform the reciprocal comparisons.Return a pairwise matrix, where each row corresponds to a reference class, each column to a query class, and each cell contains a comparison between the two classes.
The next argument indicates which statistics has to be return in the matrix (default = sig)
Supported: E(QR), E_val, F(!Q!R), F(Q!R), F(Q), F(QR), F(R!Q), F(R), H(Q), H(Q,R), H(Q|R), H(R), H(R|Q), I(Q,R), IC, P(QR), P(Q|R), P(R|Q),
P_val, Q, QR, QvR, R, U(Q|R), U(R|Q), dH(Q,R), dotprod, jac_sim, rank, sigFactor used for the multi-testing correction.
Supported values:
nt number of significance tests (default)
nq number of query classes
nr number of reference classes
nc number of comparisons (nc = nq * nr)Parameters for the operation convert-classes.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Input format : supported tab, profiles, mclOutput format : supported tab, profiles, mclMember column. Column containing the member names in the tab format (default 1).Class column. Column containing the class names in the tab format (default 2).Score column. Column containing the score in the tab format : if not specified, scores are not defined..Minimal score value for member to class assignation.Input classes in the format defined by the input_format tagTwo tab-delimited columns specifying the labels of the members of the classification given in the classification file.
First column contains the identifier and second column the corresponding label.Parameters for the operation contingency_table.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.A tab delimited fileFirst column to use for the contingency tableSecond column to use for the contingency tableCalculate the marginal sumsValue for the null character (default: 0).Parameters for the operation contingency-stats.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.A contingency table : a N*M table used to compare the contents of two classifications.
Rows represent the clusters of the first classification (considered as reference), and columns the clusters of the second classification (query).Number of decimals to display for the computed statistics.List of fields to return.
stats : table-wise statistics
rowstats : row-wise statistics (one line per row of the contingency table)
colstats : column-wise statistics (one line per column of the contingency table)
tables : full tables for each statistics (counts, Sn, PPV, separation).
margins : marginal statistics besides the tables (requires to return tables).Specify row group sizes in a separate file.
This option can be used in particular cases where the marginal sum of the contingency table does not correspond to the group sizes
(for example if a classification supports the same elements assigned to multiple groups, or on the contrary if some elements can be unassigned).
The row size file must contain one row per row of the contingency table, and two columns. The first column indicated the name of the row
(the same name as in the contingency table), and the second the size of the corresponding group.Specify column group sizes in a separate file. Same description as for -rsizes tag.Parameters for the operation matrix_distribReturn type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.The matrix file content. Matrix format is specified with the option "matrix_format" (see below) Default format: tab.Name of the file containing matrix on the server. You need to supply either this parameter or the previous one (matrix_file).Supported fields: tab, cb, consensus, gibbs, meme, assembly.Pseudo counts to apply on the matrix. Default: 1.Background model file is a tab-delimited file containing the specification of oligonucleotide frequencies.Pseudo frequency for the background models. Value must be a real between 0 and 1. Default: 0.01.Number of decimals for the matrix frequencies.Background format.upported formats: all the input formats supported by convert-background-model.Parameters for the operation compare_matricesReturn type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.The first input containing one or several matrices. Matrix format is specified with the option format1 or format (see below).The second input containing one or several matrices. Matrix format is specified with the option format2 or format (see below).Single input containing one or several matrices. Each matrix of this file is compared to each other. Matrix format is specified with the option format (see below).Name of the first file containing matrix/ces on the server. You need to supply either this parameter or matrix_1 (or matrix or tmp_matrix_infile).Name of the second file containing matrix/ces on the server. You need to supply either this parameter or matrix_2 (or matrix or tmp_matrix_infile).Name of the single file containing matrix/ces on the server. You need to supply either this parameter or matrix (or use matrix_1 + matrix_2).Martix format for the first input. Supported fields: tab, cb, consensus, gibbs, meme, assembly.Matrix format for the second input. Supported fields: tab, cb, consensus, gibbs, meme, assembly.Matrix format for both inputs. Supported fields: tab, cb, consensus, gibbs, meme, assembly.Background model is tab-delimited and contains the specification of oligonucleotide frequencies.Background model file on the server. It is a tab-delimited file containing the specification of oligonucleotide frequencies.Format for the background model file. Supported formats: all the input formats supported by convert-background-model.Only analyze the first X motifs of the first file. This options is convenient for quick testing before starting the full analysis.Only analyze the first X motifs of the second file. This options is convenient for quick testing before starting the full analysis.Prefix for the output files. The output prefix is mandatory for some return fields (alignments, graphs, ...). This prefix will be appended with a series of suffixes for the different output types (see section OUTPUT FORMATS above for the detail).-format matches (default)
Return matches between any matrix of file1 and any matrix of file2.
This is the typical use of compare-matrices: comparing one or several query motifs (e.g. obtained from pattern discovery) with a collection of reference motifs (e.f. a database of experimentally characterized transcription factor binding motifs, such as JASPAR, TRANSFAC, RegulonDB, ...).
For a given pair of matrices (one from file1 and one from file2), the program tests all possible offsets, and measures one or several matching scores (see section "(Dis)similarity metrics" above). The program only returns the sore of the best alignemnt between the two matrices. The "best" alignement is the combination of offset and strand (with the option -strand DR) that maximizes the default score (Ncor). Alternative scores can be used as optimality criteria with the option -sort.
-format profiles
Return a table with one row for each possible alignment offset between two matrices, and various columns indicating the matching parameters (offset, strand, aligned width,...), the matching scores, and the consensus of the aligned columns of the matrices.
Matching profiles are convenient for drawing the similarity profiles, or for analyzing the correlations between various similarity metrics, but they are too verbosy for the typical use of compare-matrices (detect matches between a query matrix and a database of reference matrices). The formats "matches" and "table" are more convenient for basic use.Skip comparison between a matrix and itself.
This option is useful when the program is sused to compare all matrices of a given file to all matrices of the same file, to avoid comparing each matrix to itself.
Beware: the criterion for considering two matrices identical is that they have the same identifier. If two matrices have exactly the same content (in terms of occurrences per position) but different identifiers, they will be compared.Perform matrix comparisons in direct (D) reverse complementary (R) or both orientations (DR, default option).
When the R or DR options are activated, all matrices of the second matrix file are converted to the reverse complementary matrix.
This option is useful to answer very particular questions, for example
Comparing motifs in a strand-insensitive way (-strand DR)
DNA-binding motifs are usually strand-insensitive. A motif may be detected in one given orientation by a motif-discovery algorithm, but annotated in the reverse complementary orientation in a motif database. For DNA binding motifs, we thus recomment the DR option.
On the contrary, RNA-related signals (termination, poly-adenylation, miRNA) are strand-sensitive, and should be compared in a single orientation (-strand D).
Detecting reverse complementary palindromic motifs
An example of reverse complementary palindromic motif is tCAGswwsGTGa. When a motif is reverse complementary palindromic, the matrix is correlated to its own reverse complement.
Remark about a frequent misconception of biological palindromes
Reverse complementary palindroms are frequent in DNA signals (e.g. transcription factor binding sites, restriction sites, ...) because they correspond to a rotational symmetry in the 3D structure. Such symmetrical motifs are often characteristic of sites recognized by homodimeric complexes.
By contrast, simple string-based palindromes (e.g. CAGTTGAC) do absolutely not correspond to any symmetry on the biochemical point of view, because the 3D structure of the corresponding double helix is not symmetrical. The apparent symmetry is an artifact of the string-based representation, but the corresponding molecule has neither rotational nor translational symmetry.
DNA signals can either be symmetrical (reverse complementary palindromes, tandem repeats) or asymmetrical.Obsolete option for returning matrix names, Replaced by -return matrix_name. Maintained for backward compatibility.List of fields to return (only valid for the formats "profiles" and "matches").
Supported return fields:
offset
Offset (shift) between the two compared matrices.
cor
Pearsons coefficient of correlatiojn.
Ncor
Normalized correlation (default sorting criterion for the matching mode).
cov
Covariance.
SSD
Sum of squared distances.
NSW
Normalized Sandelin-Wasserman similarity.
SW
Sandelin-Wasserman similarity.
dEucl
Euclidian distance
NdEucl
Normalized Euclidian distance.
NsEucl
Normalized Euclidian similarity.
dKL
Kullback-Leibler distance
matrix_number
Number of the matrices in the input files
matrix_id
Identifiers of the matrices
matrix_name
Names of the matrices
matrix_ac
Accession number of the matrices (TRANSFAC format makes a distinction between IDs and accession numbers).
width
Width of the matrices and the alignment
strand
Direct (D) or Reverse complementary (R) comparison
offset
Offset between the positions of the first and second matrix
pos
Relative positions the aligned matrices (start, end, strand, width)
consensus
Aligned consensus. The residues of the consensus corresponding to aligned columns are displaye, non-aligned columns are replaced by dots.
offset_rank
During pairwise alignment, scores are computed for each offset and offsets are sorted according to the sorting criterion. The offset_rank indicates the rank of an offset in this sorted list. This is a "within-alignment" rank, which is useful in profile mode.
match_rank
In matching mode, ranks can be computed for all the selected metrics, and a mean rank is computed.
alignments_pairwise
Shifted matrices resulting from the pairwise alignments.
alignments_1ton
Shifted matrices resulting from the 1-to-N alignments.
alignments
Shifted matrices resulting from the alignments (pairwise and 1-to-N).
all
All supported output fields, including all metrics.Field to sort the results. The sorting direction depends on the metric: ascending for dissimilarity metrics, decreasing for similarity metrics.
Supported sort fields:
offset, ascending (default sorting criterion for the profile mode)
Ncor, decreasing (default sorting criterion for the matching mode)
cor, decreasing
cov, decreasing
SSD, ascending
SW, decreasing
NSW, decreasing
dEucl, ascending
NdEucl, ascending
NsEucl, decreasing
dKL, ascendingLower threshold on some parameter. Format=list of 'parameter value'.
Supported fields: rank, dEucl, cor, cov, ali_len, offsetUpper threshold on some parameter. Format=list of 'param value'.
Supported parameters: same as lth.Parameters for the operation matrix_scanReturn type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Sequence(s) to scan - all the formats supported in RSAT can be used as input (default: fasta)Name of the file with input sequence(s) on the server. You need to supply either this parameter or the previous one (sequence).Matrix/ces to scan with. The matrix format is specified with the option "matrix_format" (see below) Default format: tab.Name of the file with input matrix/ces on the server. You need to supply either this parameter or the previous one (matrix).Supported fields: fasta (default), IG (Intelligenetics), WC (wconsensus), raw.Supported fields: tab, cb, consensus, gibbs, meme, assembly.Delegates scanning to the C program matrix-scan-quick (developed by Matthieu Defrance). Evaluate if the quick mode is compatible with the selected output parameters, otherwise, run in the slower mode. Incompatible with - CRER scanning - window background model.Treatment of N characters. These characters are often used in DNA sequences to represent undefined or masked nucleotides.
skip
N-containing regions are skipped.
score
N-containing regions are scored. The probability of an N is 1 for both the background model and the matrix. The N residues
will thus contribute neither positively nor negatively to the weight score of the N-containing fragment. This option can be
useful to detect sites which are at the border of N-containing regions, or in cases there are isolated N in the sequences.Use the motif (degenerate) consensus as matrix name.Pseudo-count for the matrix (default: 1).
The pseudo-count reflects the possibility that residues that were not (yet) observed in the model might however be valid for future observations.
The pseudo-count is used to compute the corrected residue frequencies..If this option is called, the pseudo-weight is distributed in an equiprobable way between residues.
By default, the pseudo-weight is distributed proportionally to residue priors, except for the -window option where equipseudo is default.Only scan with the top # matrices per matrix file.
This option is valid for some file formats containing multiple matrices where top matrices are generally more informative.Background model is a tab-delimited specification of oligonucleotide frequencies.Background model file (tab-delimited specification of oligonucleotide frequencies) on the server.To use a precalculated background model from RSAT, choose the organism corresponding to the background model.
Works with background and markov options.To use a precalculated background model from RSAT. Works with organism and markov options.
Type of sequences used as background model for estimating expected oligonucleotide frequencies.
Supported: upstream, upstream-noorf, upstream-noorf-rm, intergenicCalculate background model from the input sequence set.
This option requires to specify the order of the background model with the option markov.Size of the sliding window for the background model calculation.
This option requires to specify the order of the background model with the option markov (suitable for short order model only markov 0 or 1)Order of the markov chain for the background model. This option is incompatible with the option background.Pseudo frequency for the background models. Value must be a real between 0 and 1.
If this option is not specified, the pseudo-frequency value depends on the background calculation.
For -bginput and -window, the pseudo frequency is automatically calculated with the length (L) of the sequence following this formula:
square-root of L divided by L+squareroot of L
For -bgfile, default value is 0.01.
If the training sequence length (L) is known, the value can be set by -bg_pseudo option to square-root of L divided by L+squareroot of L.List of fields to return. Supported fields: sites, rank, limits, normw, bg_model, matrix, freq_matrix, weight_matrix, distrib .Sort score distribution by decreasing value of significance, if value = 1.
By default, the score distributions are sorted by score (weight).Lower threshold on some parameter. Format=list of 'parameter value'.
Supported fields: score, pval, eval, sig, normw, proba_M, proba_B, rank, crer_sites, crer_size, occ, occ_sum, inv_cum,
exp_occ, occ_pval, occ_eval, occ_sig, occ_sig_rankUpper threshold on some parameter. Format=list of 'param value'.
Supported parameters: same as lth.Scan 1 or 2 strands for DNA sequences.Level of verbosity (detail in the warning messages during execution)Define the origin for the calculation of positions.
-origin -0 defines the end of each sequence as the origin.
The matching positions are then negative values, providing the distance between the match and the end of the sequence.Number of decimals displayed for the weight score.Assign one separate feature ID per CRER. This option is convenient to distinguish separate CRERs.Parameters for the operation convert_matrix.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Matrix (or assembly or features) you want to convert.Format for the background model (prior) files. Supported: oligo-analysis, MotifSampler, meme, dyads.Pseudo frequency for the background models. Value must be a real between 0 and 1 (default: 0.01).Input matrix format. Supported: alignace, assembly, cb, clustal, consensus, feature, gibbs, infogibbs, meme, motifsampler, tab, transfac.Output matrix format. Supported: consensus, patser, tab, transfac.Result type (matrix content). Supported: consensus, counts, frequencies, info, information, logo, margins, parameters, profile, sites, wdistrib, weights.desc | asc | alpha sort_key.
Sort matrices according to the specified attribute (sort_key). The sorting can be done on numerical values, either in descending (desc) or ascending (asc) order.
It can also be done in alphabetical order (alpha).
The key must be one of the numeric parameters of the matrices (e.g. information.content, E-value, ...).
This option is convenient, for example, to sort matrices from MotifSampler according to their information content:
-sort desc MS.ic.Maximal number of matrices to return.
Some of the input formats can contain several matrices in a single file (e.g. consensus, meme, MotifSampler).
By default, all the matrices are parsed and exported. The option -top allows to restrict the number of matrices to be exported.pseudo-weight used for the calculation of the weight matrix (default: 1).If value is 1, the pseudo-weight is distributed in an equiprobable way between residues.
By default, the pseudo-weight is distributed proportionally to residue priors.Base for the logarithms used in the scores involving a log-likelihood (weight and information content). Default: exp(1) (natural logarithms).
A common alternative to natural logarithms is to use logarithms in base 2, in which case the information content is computed in bits..Number of decimals to print for real matrices (frequencies, weights, information) or to compute score distributions.
Warning: for the computation of score distributions, the computing time increases exponentially with the number of decimals.
We recommend to restrict the precision to 2 decimals for the weight, this is generally more than sufficient.Number of permuted matrices to return.
Matrix columns are permuted so that the total information content remains identical to the original matrix.
Note that the output format for permuted matrix is tab.Maximal width of the profile histogram (units = number of characters).Convert the matrix to its reverse complement if value = 1.Parameters for the operation random_seq.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Length of sequence to generate.Number of sequences to generate.Format of sequence(s) to generate.A newline character will be inserted in the sequence every # bases, where # is the number provided.
Default is 70. A value of 0 will prevent newline insertion.Type of sequence(s) to generate (protein | DNA | other).Seed for the random generator.Alphabet. Must be followed by residue frequencies expressed precisely this way: a:t # c:g #Expected frequencies of oligomers in sequence(s) to generate. Indicate the file that contains expected oligomer frequencies.
When this option is used, the sequences are generated according to a Markov chain.Name of the file with expected frequencies on the server.Background model. Automatically load a pre-calibrated exected frequency file from the RSAT genome distribution.
When this option is used, the options organism and oligo_length are also required, to indicate the organism and the oligonucleotide length, respectively.
This option is incompatible with the option expfreq.
Type of sequences used as background model for estimating expected oligonucleotide frequencies (supported models):
- equi (equiprobable residue frequencies [default]),
- upstream (all upstream sequences, allowing overlap with upstream ORFs. Requires to speciy a model organism),
- upstream-noorf (all upstream sequences, preventing overlap with upstream ORFs. Requires to specify a model organism), and
- intergenic (intergenic frequencies. Whole set of intergenic regions, including upstream and downstream sequences. Requires to specify a model organism).Name of the organism when using a background model.Length of oligomer when using a background model.Length file. Allows to generate random sequences with the same lengths as a set of reference sequences.
The length file contains two columns : sequence ID (ignored) and sequence length.Parameters for the operation fetch_sequences.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.The input file should be in bed format.Input file on the server.Use as input a file available on a remote Web server (e.g. a bed file on your Galaxy account). This option is mutually exclusive with the previous two options.Genome version (e.g. mm9, hg19).Format for sequence headers (UCSC | galaxy).Extend each region by # base pairs on the upstream side (i.e. left side for + strands, right side for - strand)..Extend each region by # base pairs on the downstream side (i.e. right for + strand, left for - strand)..Extend each region by # base pairs on both upstream and downstream sides.Reference from which the sequences should be fetched.
segment (default)
Retrieve sequences from the start to the end positions of each
feature (possibly extended with the options -upstr_ext, -downstr_ext or -extend).
start | end | center
Retrieve sequences relative to repsectively the start, the end or the central position of each feature.
This option is generally combined with the options -upstr_ext, -downstr_ext or -extend, in order to retrieve sequences of a fixed width around the reference coordinate (e.g. 200 bp on each side of peak centers).Only consider the # top features of the bed file as queries.Send queries to UCSC by chunk of # features (default: chunk=10000).Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Input format. Supported: tab, gml, adj_matrix.Compute an edge color for the GML output. The color intensity is proportional to the weight of the edge.
All weights in the column indicated by the -wcol argument must thus be real values.
Supported: green, blue, red, fire, grey.Output format. Supported: tab, gml, dot, adj_matrix.A graph in the format specified by the informat tag.Specify a column of the input graph that contains an edge weight or an edge label for the tab-delimited format (no default).Specify a column of the input graph that contains the source nodes for the tab-delimited format (default = 1).Specify a column of the input graph that contains the target nodes for the tab-delimited format (default = 2).Specify a column of the input graph that containsthe color of the edge (no default).Specify a column of the input graph that contains the color of the source node (no default).Specify a column of the input graph that contains the color of the target node (no default).Specify the column containing the pathsThe graph is considered as being undirected (useful for the adjacency matrix input and output).The nodes belonging to different paths are duplicated with this optionSpecify whether the disposition of each node has to be calculated using the $RSAT/bin/fr_layout program.
This option is only useful for GML output.Calculate the edge width for the GML output. The width is proportional to the weight of the edge.
This value can only be computed for the GML output. All weights in the column indicated by the -wcol argument must thus be real values.Column containing the X position of the target nodeColumn containing the Y position of the target nodeColumn containing the X position of the source nodeColumn containing the Y position of the source nodeReturn type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Input format. Supported: tab, gml, adj_matrix.Output format. Supported: tab, gml, dot, adj_matrix.A graph in the format specified by the informat tag.Specify a column of the input graph that contains an edge weight or an edge label for the tab-delimited format (no default).Specify a column of the input graph that contains the source nodes for the tab-delimited format (default = 1).Specify a column of the input graph that contains the target nodes for the tab-delimited format (default = 2).Specifies whether the edges must be considered as directed, i.e., an edge from node A to node B is different from an edge from B to A (by default, edges are not directed).Specifies whether more than one edge may link two nodes. (by default, duplicated edges are not allowed).Allows self loops (by default, self loops are not allowed)Nodes that have to be removed in the graph (if existing). The node names must be separated by comas.Number of edges to add. This value can either be a percentage value or a discrete number.Number of edges to remove. This value can either be a percentage value or a discrete number.Number of edges to add. This value can either be a percentage value or a discrete number.Number of edges to remove. This value can either be a percentage value or a discrete number.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Input format. Supported: tab, gml, adj_matrix.A graph in the format specified by the informat tag.Specify a column of the input graph that contains the source nodes for the tab-delimited format (default = 1).Specify a column of the input graph that contains the target nodes for the tab-delimited format (default = 2).Minimum size of the clique to return.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Input format. Supported: tab, gml, adj_matrix.Output format. Supported: ps, png, jpeg.Calculate the edge width for the GML output. The width is proportional to the weight of the edge. This value can only be
computed for the GML output. All weights in the column indicated by the -wcol argument must thus be real values.A graph in the format specified by the informat tag.Specify a column of the input graph that contains an edge weight or an edge label for the tab-delimited format (no default).Specify a column of the input graph that contains the source nodes for the tab-delimited format (default = 1).Specify a column of the input graph that contains the target nodes for the tab-delimited format (default = 2).Specify a column of the input graph that containsthe color of the edge (no default).Specify a column of the input graph that contains the color of the source node (no default).Specify a column of the input graph that contains the color of the target node (no default).Calculates the layout according to the Fruchterman and Reingold algorithm.
This option must be provided if the input graph is not GML.
Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Output format. Supported: png, jpeg.Returns an HTML file that load the heatmap. The name of this file is the name of the output file on the server with the html extension A graph in the format specified by the informat tag.Use this option if the first column contain the row names.Using this option, the values are not written in the cells of the heatmapWidth of the columns (in pixel). If the row height is to small, the label of the heatmap will not be indicated. (Default : 50 px)Height of the rows (in pixel). If the row height is to small, the label of the heatmap will not be indicated. (Default : 30 px)Minimal value of the heatmap. By default, this value is the minimal value of the input file.
If the specified value is larger than the minimal value of the heatmap, then the minimal value of the heatmap will be used as minimal value.Maximal value of the heatmap. By default, this value is the maximal value of the input file.
If the specified value is smaller than the maximal value of the heatmap, then the maximal value of the heatmap will be used as maximal value.Color of the intensity gradient of the heatmap. Default is grey.
Supported : green, blue, red, fire, grey.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Input format of query graph. Supported: tab, gml.Input format of reference graph. Supported: tab, gml.Output format. Supported: tab, gml, dot.Q (weights of the query graph), R (weights of the reference graph), sum (sum of the weights of the two graphs),
mean (mean of the weights of the two graphs), mean.g (geometrical mean of the weights of the two graphs),
min (minimum weight), max (maximum weight), Q::R (weight of the two graphs) (default).The reference graph in the format specified by the informat tag.The query graph in the format specified by the informat tagSpecify a column of the query input graph that contains an edge weight or an edge label (default none) for the tab-delimited format.Specify a column of the query input graph that contains the source nodes for the tab-delimited format (default = 1).Specify a column of the query input graph that contains the target nodes for the tab-delimited format (default = 2).Specify a column of the reference input graph that contains an edge weight or an edge label (default none) for the tab-delimited format.Specify a column of the reference input graph that contains the source nodes for the tab-delimited format (default = 1).Specify a column of the reference input graph that contains the target nodes for the tab-delimited format (default = 2).intersection, union, difference, R.and.Q, Q.and.R, Q.or.R, Q.not.R, R.not.Q, Q.and.R+Q,
Q.and.R+R, R.and.Q+Q, R.and.Q+R, intersection+Q, intersection+R.Indicates whether the graphs must be considered as directed, i.e., an arc from node A to node B is different from an arc from B to A.Indicates whether the graphs can admit self-loops, i.e., an arc from a node to itself.
Note that the graphs do not specially need to contain actual self-loops, the question is whether
it would or not be acceptable for the considered input graphs to contain self-loops.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Input format. Supported: tab, gml.The degree of all nodes will be computed.A graph in the format specified by the informat tag.A file containing the nodes for which you want to know the degree.Specify a column of the input graph that contains an edge weight or an edge label (default none) for the tab-delimited format.Specify a column of the input graph that contains the source nodes for the tab-delimited format (default = 1).Specify a column of the input graph that contains the target nodes for the tab-delimited format (default = 2).Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Input format. Supported: tab, gml.The statistics will be computed for all nodes.return type : degree, closeness, betweenness. More than one statistics can be returned by separating the fields with ','.
To return all implemented statistics, you can use all.A graph in the format specified by the informat tag.A file containing the nodes for which you want to know the degree and other statistics.Specifies whether the graph is directed or not (i.e. edge A-B corresponds to edge B-A).
In this case, the betweenness and the closeness calculation will be rather different.
By default the graph is not directed.Specify a column of the input graph that contains an edge weight or an edge label (default none) for the tab-delimited format.Specify a column of the input graph that contains the source nodes for the tab-delimited format (default = 1).Specify a column of the input graph that contains the target nodes for the tab-delimited format (default = 2).Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Input format. Supported: tab, gml.A graph in the format specified by the informat tag.clustering as a tab-delimited file.Determines how the node membership will be calculated (edge, weight, relw).Number of decimals to print for the membership. Note that by selecting this option, the entries of the membership-vectors (rows) won't sum up to 1.Specify a column of the input graph that contains an edge weight or an edge label (default none) for the tab-delimited format.Specify a column of the input graph that contains the source nodes for the tab-delimited format (default = 1).Specify a column of the input graph that contains the target nodes for the tab-delimited format (default = 2).Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Input format. Supported: tab, gml.Output format. Supported: tab, gml, dot.A graph in the format specified by the informat tag.Randomization type:
- scratch (de novo graph) : don't forget to specify the number of nodes and of edges,
- ER (Erdos-Renyii randomization) : corresponds to the randomization of a input graph, keeping the nodes and the number of edges but changing its characteristics,
- node_degree : each node will keep the same degree that in the input graph (edge randomization),
- node_degree_distrib : the global distribution of node degree will remain the same as in the input graph.Specify a column of the input graph that contains an edge weight or an edge label (default none) for the tab-delimited format.Specify a column of the input graph that contains the source nodes for the tab-delimited format (default = 1).Specify a column of the input graph that contains the target nodes for the tab-delimited format (default = 2).Number of requested edges (for scratch randomization type).Maximal degree of the nodes in the ER random graph.Number of requested nodes (for scratch randomization type).Allows self loops (by default, self loops are not allowed)Mean value on the weight of the arcs.
This argument can only be used with the scratch and ER randomization type and must be combined with the -sd option.Standard deviation value of the weight of the arcs.
This argument can only be used with the scratch and ER randomization type and must be combined with the -mean option.Indicates whether the graphs must be considered as directed, i.e., an arc from node A to node B is different from an arc from B to A.Prevent the ER / scratch graph from containing nodes with no neighbour.Specifies whether more than one edge may link two nodes (by default, duplicated edges are not allowed).Only compatible with ER randomization of a graph. Source and target nodes stay source and target nodes in the randomized graph.This option can only be used with ER randomization type and if the input graph is weighted.
Using this option will generate randomly the weight of the output random graph according to a normal distribution of weights.
The mean and standard deviation can then be chosen (-mean and -sd option) or will be calculated according to the weights the input graph.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.A graph in the tab delimited format first column : source node, second column : target node, third column : edge weightSets the main inflation value. This value is the main handle for affecting cluster granularity. It is usually chosen somewhere in the range [1.2-5.0].
-I 5.0 will tend to result in fine-grained clusterings, and -I 1.2 will tend to result in very coarse grained clusterings.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.A graph in the format required by RNSC : an adjacency list in which each edge appears only once.
The vertices are labelled with the integers 0, 1, ..., n-1. The list of neighbours for vertex v appears as v n_1 n_2 ... n_x -1.
The input network in the correct format can be produced by the convert-graph program.Allow no more than "num" clusters. "num" must be between 2 and n, where n is the number of vertices in the graph.
If this option is not specified or an invalid value is given, n clusters are used.Set the tabu length to "num". Default value is 1. Note that when the tabulist option is used, vertices can appear
on the tabu list more than once and moving them is only forbidden when they are on the tabu list more than TabuTol times, where TabuTol is the tabu list tolerance.Set the tabu list tolerance to "num". Default value is 1.
The tabu list tolerance is the number of times a vertex must appear on the tabu list before moving it is forbidden.Set the naive stopping tolerance to "num". Default value is 5. This is the number of steps that the naive scheme will continue without improving the best cost.
If you run the scaled scheme, using a higher naive stopping tolerance isn't likely to improve your results.Set the scaled stopping tolerance to "num". Default value is 5. This is the number of steps that the scaled scheme will continue without improving the best cost.
Setting the tolerance to 0 will cause the algorithm to skip the scaled scheme.Run "num" experiments. The best final clustering over all experiments will be written to file. Default is 1.Set the diversification frequency to "num". Without this option, no diversification will be performed.
If the shf_div_len flag is also used, then "num" is the shuffling diversification frequency.
If the -d flag is not used, then "num" is the destructive diversification frequency.
It is recommended that the shf_div_len flag is used, because destructive diversification isn't much help.Set the shuffling diversification length to "num". That means that the last "num" moves in the diversification period will be diversification moves.
Don't set this to be higher than the diversification frequency.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Input graph. Supported: tab, gml.Direction of the neighbours (default all). Supported: in, out, all. This option cannot be used with the stat output and when specifying the number of stepsThe neighbours of all nodes will be searched.Only valid when the number of step is equal to 1. The output file is presented differently, with one line for each seed node.Include each node in its neighborhood, with a distance of 0, even if there is no self-loop at this node.
This allows to extract the node together with its neighborhood, rather than the neighborhood only (default). This option cannot be used with the stats option.A graph in the format specified by the informat tag.A list of nodes for which you want to know the neighbours.Specify a column of the input graph that contains an edge weight or an edge label (default none) for the tab-delimited format.Specify a column of the input graph that contains the source nodes for the tab-delimited format (default = 1).Specify a column of the input graph that contains the target nodes for the tab-delimited format (default = 2).Maximal number of steps between a seed node and its neighbours. Default: 1.Return type.
Accepted values: 'server' (result is stored on a file on the server), 'client' (result is directly transferred to the client),
'both' (result is stored on the server and transferred to the client), and ticket (an identifier, allowing to monitor the job status and retrieve the result when it is done, is returned to the client).
Default is 'both'.Input format. Supported: tab, gml.Return format. Supported: table, cluster, graph.Output format. Supported: tab, gml, dot.A graph in the format specified by the informat tag.Specification of the clusters to which belong the nodes.Specify a column of the input graph that contains an edge weight or an edge label (default none) for the tab-delimited format.Specify a column of the input graph that contains the source nodes for the tab-delimited format (default = 1).Specify a column of the input graph that contains the target nodes for the tab-delimited format (default = 2).As some nodes may belong to more than one group, using this option will duplicate the nodes belonging to more than one group.Using this option, only the first column of the cluster file will be taken into account.
The output graph will thus consist in the graph induced by all nodes of the first column.Ticket of a job submitted with its output_choice set to 'ticket'. Ticket of a job submitted with its output_choice set to 'ticket'. Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request.The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Location of the result file on the server. This can be used as input for a further request. The stand-alone command executed on the server.The results.Return status ('Running' or 'Done') of a job submitted with its output_choice set to 'ticket'. Return result of job submitted with its output_choice set to 'ticket'.Location of the result file on the server. This can be used as input for a further request. Returns upstream, downstream or coding DNA sequences for list of query genes.Returns upstream, downstream or coding DNA sequencesfor list of query genes and organisms.Returns upstream, downstream or coding DNA sequences for list of query genes (in EnsEMBL database).Mask repeated fragments of an input sequence.Analysis of the statistical significance of all the oligomers of a given size in a sequence. Commonly used to detect over-represented oligonucleotides in a set of promoter sequences.Compare oligonucleotide occurrences between two input sequence files (test and control), and return oligos that are significantly enriched in one of the files respective to the other one.Pipeline for discovering motifs from ChIP-seq (or ChIP-chip) peak sequences.Analysis of the statistical significance of all the spaced dyads of a given size in a sequence. Commonly used to detect over-represented spaced dyads in a set of promoter sequences.Calculates the positional distribution of oligonucleotides in a set of sequences, and detects those which significantly discard from a homogeneous distribution.Assemble a set of oligonucleotides or dyads into groups of overlapping patterns (assemblies).Searches all occurrences of a pattern within DNA sequences.Interconversions between various formats of feature description.Draws a graphical map of features (e.g. results of pattern matching) in a set of sequences.Detect phylogenetic footprints by applying dyad-analysis in promoters of a set of orthologous genes.Get orthologuous genes.Infer operon.Get information about genes.List RSAT suppported organisms.List RSAT suppported motif databases.Converts a tab-delimited file into a HTML tableConverts a psi xml file in a tab delimited fileComputes, from a set of scored results associated with validation labels, the derived statistics (Sn, PPV, FPR), which can be further used to draw a ROC curve.Plot a graph and export it.This script takes a group of numbers (real or integers) and outputs their distribution among classes.Converts a sequence between two formats (e.g. fasta -> raw).Compare two class files(the query file and the reference file). Each class of the query file is compared to each class of the reference file. The number of common elements is reported, as well as the probability to observe at least this number of common elements by chance alone.Interconversions between different formats of cluster files.This programs takes as input a contingency table, and calculates various matching statistics between the rows and columns. The description of these statistics can be found in Brohee and van Helden (2006). Create a contingency table from a two-column file.Compare two collections of position-specific scoring matrices (PSSM), and return various similarity statistics + matrix alignments (pairwise, one-to-n).Scan sequences with one or several position-specific scoring matrices (PSSM) to identify instances of the corresponding motifs(putative sites). This program supports a variety of background models (Bernoulli, Markov chains of any order).Performs inter-conversions between various formats of position-specific scoring matrices (PSSM). The program also performs a statistical analysis of the original matrix to provide different position-specific scores (weight, frequencies, information contents), general statistics (E-value, total information content), and synthetic descriptions (consensus).Returns the theoretical distribution of matrix weight within the defined background model.Generates random sequences.Returns DNA sequences for list of coordinates in BED format.Convert graphs between different formatsAlter a graph either by adding or removing edges or nodesFind all cliques in a graphProduces the figure of a graphProduces the figure of a heatmapComputes the union / difference or intersection of two graphsCalculates the in / out / global degree for a selection of seed nodesCalculate the node degree, the closeness and the betweenness of each node and specifies if this node is a seed or a target node.Map a clustering result onto a graph, and compute the membership degree between each node and each cluster, on the basis of egdes linking this node to the cluster.Generate random graphs either from scratch of from an existing graph using different randomization modelsCompares a graph with a classification/clustering file.Find the neihbours up to a certain distance of a collection of seed nodesClustering via Stijn van Dongen MCL algorithmClustering via Andrew King RNSC algorithmMonitoring the status of a jobGet result of a jobWeb services for the Regulatory Sequence Analysis Tools (RSAT). Tools developed by Jacques van Helden (jvanheld@bigre.ulb.ac.be), SOAP/WSDL interface developed by Olivier Sand (oly@bigre.ulb.ac.be).