RSAT - convert-variations manual






Performs inter-conversions between different formats of polymorphic variations.


Walter Santana-Garcia
Jacques van Helden
Alejandra Medina-Rivera


Genetic variations


Genome Variant Format (GVF), Variant Call Format (VCF) and RSAT variation format (varBed).

Genome Variant Format (GVF)

"The Genome Variant Format (GVF) is a type of GFF3 file with additional pragmas and attributes specified. The GVF format has the same nine column tab delimited format as GFF3 and all of the requirements and restrictions specified for GFF3 apply to the GVF specification as well." (quoted from the Sequence Ontology)

A GVF file starts with a header providing general information about the file content: format version, date, data source, length of the chromosomes / contigs covered by the variations.

 ##gff-version 3
 ##gvf-version 1.07
 ##file-date 2014-09-21
 ##genome-build ensembl GRCh38
 ##data-source Source=ensembl;version=77;url=
 ##file-version 77
 ##sequence-region Y 1 57227415
 ##sequence-region 17 1 83257441
 ##sequence-region 6 1 170805979
 ##sequence-region 1 1 248956422
 ## [...]

This header is followed by the actual description of the variations, in a column-delimited format compying with the GFF format.

 Y       dbSNP   SNV     10015   10015   .       +       .       ID=1;variation_id=23299259;Variant_seq=C,G;Dbxref=dbSNP_138:rs113469508;allele_string=A,C,G;evidence_values=Multiple_observations;Reference_seq=A
 Y       dbSNP   SNV     10146   10146   .       +       .       ID=2;variation_id=26647928;Reference_seq=C;Variant_seq=G;evidence_values=Multiple_observations,1000Genomes;allele_string=C,G;Dbxref=dbSNP_138:rs138058540;global_minor_allele_frequency=0|0.0151515|33
 Y       dbSNP   SNV     10153   10153   .       +       .       ID=3;variation_id=21171339;Reference_seq=C;Variant_seq=G;evidence_values=Multiple_observations,1000Genomes;allele_string=C,G;Dbxref=dbSNP_138:rs111264342;global_minor_allele_frequency=1|0.00229568|5
 Y       dbSNP   SNV     10181   10181   .       +       .       ID=4;variation_id=47159994;Reference_seq=C;Variant_seq=G;evidence_values=1000Genomes;allele_string=C,G;Dbxref=dbSNP_138:rs189980076;global_minor_allele_frequency=0|0.00137741|3

The last column contains a lot of relevant information, but is not very easy to read. We should keep in mind that this format was initially defined to describe generic genomic features, so all the specific attributes come in the last column (description).

Variant Call Format (VCF)

This format was defined for the 1000 genomes project. It is no longer maintained. The converter supports it merely for the sake of backwards compatibility.

RSAT variation format (varBed)

Tab-delimited format with a specific column order, used as input by retrieve-variation-seq.

This format presents several advantages for scanning variations with matrices.


A tab delimited file on selected output format.


Variants to be converted

Variation data that will be converted, supported formats: GVF, VCF or varBed.

Input format

Variation format of the input data, supported formats: GVF, VCF or varBed.

Output format

Variation format of the desired output data, supported formats: GVF, VCF or varBed


For further inquiries, please contact Jacques van Helden ( or Ask a question to the RSAT team!