QGene data format

The standard format for importing marker, map, and trait data into QGene is as a .qdf file, which is structured as follows:

[Header]
Study name MY1 location means 10.8.07
Mating string r
Genotype symbols ABHxx-
Parent1 RT0034
Parent2 Cypress
[Locus]
AP2882 1 0 AAAAA-BB-BBBAABAABAAAAAAAAAABABBABBBBBBBAAAAAAABAAABBAAAAABABBBAAABAAABBBBBBBBBBBAAAABAAABBBBBBBAAABBBAABAHAAAAABBABAABBAAAAAABAA
RM10149 1 14.9 AAAAAAAABBBBABBAB-AAAAAAAAAAAABABBBBB-BBA-AAAAABAAHBBAABAABABBBABHBA-AABABBABBBABBAAABBABBBAAAABAAABBBAAAABAABAABBABAABBBAAAAABAB
[Trait]
AMY_AR M 21.8 21.85 22.7 22.65 22.15 23.3 23.4 23.9 22.95 23.7 22.8 21.5 23.6 20 . . 22.2 21.8 23.4 23.95 24.2 23.3 22.9 24.4 21.45 23.15 23.9 24.55 23.7 23.8 21.1 22.7 23.5 22.9 24.1 23.6 22.8 . 23.5 . 21.25 22 20.9 24.95 24.05 23.8 24.2 22.95 . 24.3 23.85 21.3 21.6 22.7 24.6 23.55 22.8 20.75 23.6 23.95 22.9 23.8 24.1 22 24.85 22.55 23.55 23.95 22.5 24.1 24.6 21.45 24.3 . 22.9 24 23.8 24.4 21.7 24.3 23.7 22.7 23.4 23.4 24.55 24.95 21.6 24.6 20.1 23.95 24.25 23.15 24.2 23.1 22.3 24.05 23.9 21.9 23.2 22.1 21.85 . 23.85 23.65 24 21.9 . 21.75 22.9 22.4 22.85 20.85 22.9 23.95 22 23.9 23.5 23.6 21.15 22.45 23.8 22.4 24.1 24.35 24.1 23.65 23.6 23.95 .
BLANKS_AR M 0 35 46.65 20 31.7 26.7 21.7 0 31.65 0 38.3 6.65 31.7 0 80 70 33.5 33.3 30 21.65 30 21.65 16.7 16.65 29.2 14.15 25.85 13.35 31.65 33.35 0 0 31.65 36.7 23.3 27.5 23.3 0 21.7 . 5 36.65 40 5.85 47.5 0 27.5 25 . 26.65 30 38.35 41.5 30 20 19.15 10 31.65 15.8 13.35 20.8 62.5 31.65 23.3 7.5 0 0 0 0 55 15 15 22.5 74.15 25 16.65 45 18.35 0 30 36.65 30 38.35 0 31.65 18.3 35.85 26.7 0 35 0 0 11.65 25.8 33.3 15.85 33.3 61.65 16.7 66.65 19.15 0 22.5 35 25 43.15 40 0 0 20 65 0 0 0 11.7 36.7 0 11.65 16.65 19.15 0 16.65 68.35 10 36.7 31.65 0 30 0

Starting with version 4.3, which allows eQTL analysis, the .qdf format can also accommodate e-trait data and the markers used for an eQTL experiment. In this case one line of the header must consist of the word eQTL.

Explanation of .qdf format

The three content keywords in square brackets tell QGene the content of the material following them, up to the next keyword.

[Header] section

The Study name will be used to identify the data set in the Data manager, so use an informative (though not too long) name. The Mating string describes the genetic model to be used. At present QGene understands the following mating designs:

r refers to a standard recombinant-inbred progeny, created by multiple-generation selfing but possibly retaining some heterozygosity.
Any sequence of b, d, s, and i or their upper-case counterparts may be provided as a mating string. These operations refer to backcrossing, doubled-haploid creation, selfing, and random intercrossing, and the string is assumed to start with operations applied to the F₁ generation. To specify a BC₁F₁ design, for example, write only b. An F₂ will be s, an F3 ss, and a series of three backcrosses followed by a selfing bbbs. Did you backcross twice, self once, randomly intercross twice, and backcross again? No trouble: that's bbsiib.
QGene does not understand outcross designs at present.

The Genotype symbols are those you are using to represent the parent 1 homozygote AA, parent 2 homozygote aa, heterozygote Aa, the dominant marker phenotypes a_ and A_, and missing data, in the order given here. The x characters in the example denote symbols that do not appear in your data. You may use any alphanumeric symbols you want, but your file will be clearest to you and others if you stick to the ABHCD- convention introduced by the Mapmaker program and observed by others since. In all backcross designs, the first, not second (CORRECTED 10.21.2010), symbol (here A) is assumed to represent the recurrent parent.

The Parent1 and Parent2 entries provide QGene with labels for QTL effect plots, where analysts wish to determine the parental origin of a superior QTL allele -- and may be used in other plots such as for marker segregation, showing parental means on histograms, etc. If you don't provide names for the parents, QGene will default to A and B.

The eQTL entry in the header, if present, requires the entry for each marker to show a physical-map position (or a ? if this position is unknown). This is the only purpose of the eQTL line.

[Locus] section

Here the map and marker genotype (technically, marker phenotype) data are given. First word on each line is the marker name, followed by the chromosome and cM position of the marker, followed by (if the eQTL header line is present) a physical-map position for the marker, and finally the marker data.
If you don't know the true chromosome or genetic or physical map position, you must still enter some value for these entries.
The marker data values need not be separated by whitespace, but will still be read properly if they are. QGene will verify that each marker is accompanied by the same number of genotype values, representing the individuals in the population. Please note that the marker data must not be interrupted by a newline (or return) character, even though on your monitor they may be wrapped in the above display.

[Trait] section

The first word on each line is the trait name.
For ordinary traits, the second word must be either N, O, or M -- indicating the trait to be nominal, ordinal, or metric.
For e-traits, the second word must be EN, EO, or EM. If QGene sees these words, it will expect three more "words" before the start of the trait data: the chromosome name or number and the start and end positions (in chromosome, not genome, coordinates) of the e-trait on the physical map. See the sample file (yeast-combined.qdf).
For both regular and eQTL files, the rest of the entries on a trait line are either numbers (for ordinal and metric traits) or alphanumeric strings (for nominal traits).
Missing trait data should be represented by periods (.), but hyphens (-) and question marks (?) are also accepted. Trait values should not be interrupted by newlines. If you have a population of >250 or so, you won't be able to prepare the .qdf datafile directly in pre-2007 versions of Microsoft Excel. If you can prepare it with the markers and traits in columns instead of rows, you can use this Perl script to transpose your data. OK, it's a bit tedious, but that's all we have just now.
As with the marker data, QGene will verify that the numbers of trait values are consistent among traits and also with those of marker data.

Notes

While you may load marker data with no trait data or vice versa, if your data set contains both kinds of data the marker data must come first.
You can't mix marker lines from a regular (non-eQTL) .qdf file with those in an eQTL .qdf file. However, because you may represent unknown physical positions of markers by ? and because all data in either kind of file can be subjected to all ordinary QTL analyses, there is no harm in converting all of your regular .qdf files to eQTL files by inserting this character and adding the eQTL header line.
While it's convenient to prepare your .qdf data file in Microsoft Excel, (possibly using the utility script mentioned above) don't save it as an .xls file, which QGene won't be able to read! Save it as tab-separated text, .txt.
Very rarely QGene will refuse to open the QTL-analysis window for a data file you have successfully loaded. This happens when the expected size of some genotype class present in your data is smaller than 2, so that QTL analysis would result in difficulty in estimating effects, leading to confusing error messages. This will usually happen only with absurdly small data sets.

Other permitted formats

QGene 3.0 (Macintosh)

You can load files in the old QGene format. In the Load data file dialog, select both the .data and the .map files after choosing the corresponding file type from the Files of type: dropdown menu. To make this multiple selection you'll need to hold down your Control (Windows, Unix) or Command (Macintosh) key.

QTL Cartographer

This option should work for saving, but not loading, .mcd files.