The standard format for importing marker, map, and trait data into QGene is as a .qdf file, which is structured as follows:
Starting with version 4.3, which allows eQTL analysis, the .qdf
format can also accommodate e-trait data and the markers used for an
eQTL experiment. In this case one line of the header must consist of
the word eQTL.
Explanation of .qdf format
The three content keywords in square brackets tell QGene the content of the material following them, up to the next keyword.
[Header] section
The Study name will be used to identify the data set in the Data manager, so use an informative (though not too long) name. The Mating string describes the genetic model to be used. At present QGene understands the following mating designs:
- r refers to a standard recombinant-inbred progeny, created by multiple-generation selfing but possibly retaining some heterozygosity.
- Any sequence of b, d, s, and i or their upper-case counterparts may
be provided as a mating string. These operations refer to backcrossing,
doubled-haploid creation, selfing, and random intercrossing, and the
string is assumed to start with operations applied to the F1 generation.
To specify a BC1F1 design, for example, write only
b. An F2 will be
s, an F3 ss,
and a series of three backcrosses followed by a selfing bbbs.
Did you backcross twice, self once, randomly intercross twice, and backcross again? No trouble: that's
bbsiib.
- QGene does not understand outcross designs at present.
The Parent1 and Parent2 entries provide QGene with labels for QTL effect plots, where analysts wish to determine the parental origin of a superior QTL allele -- and may be used in other plots such as for marker segregation, showing parental means on histograms, etc. If you don't provide names for the parents, QGene will default to A and B.
The eQTL entry in the header, if present, requires the entry for each marker to show a physical-map position (or a ? if this position is unknown). This is the only purpose of the eQTL line.
[Locus] section
- Here the map and marker genotype (technically, marker phenotype) data are given. First word on each line is the marker name, followed by the chromosome and cM position of the marker, followed by (if the eQTL header line is present) a physical-map position for the marker, and finally the marker data.
- If you don't know the true chromosome or genetic or physical map position, you must still enter some value for these entries.
- The marker data values need not be separated by whitespace, but will still be read properly if they are. QGene will verify that each marker is accompanied by the same number of genotype values, representing the individuals in the population. Please note that the marker data must not be interrupted by a newline (or return) character, even though on your monitor they may be wrapped in the above display.
[Trait] section
- The first word on each line is the trait name.
- For ordinary traits, the second word must be either N, O, or M -- indicating the trait to be nominal, ordinal, or metric.
- For e-traits, the second word must be EN, EO, or EM.
If QGene sees these words, it will expect three more "words" before the
start of the trait data: the chromosome name or number and the start and end positions
(in chromosome, not genome, coordinates) of the e-trait on the physical map. See the sample file
(yeast-combined.qdf).
- For both regular and eQTL files, the rest of the entries on a trait line are either numbers (for ordinal and metric traits) or alphanumeric strings (for nominal traits).
- Missing trait data should be represented by periods (.), but hyphens (-) and question marks (?) are
also accepted. Trait values should not be interrupted by newlines.
If you have a population of >250 or so, you won't be able to prepare the .qdf datafile directly
in pre-2007 versions of Microsoft Excel. If you can prepare it with the markers and traits in columns instead of rows, you can
use this Perl script to transpose your data.
OK, it's a bit tedious, but that's all we have just now.
- As with the marker data, QGene will verify that the numbers of
trait values are consistent among traits and also with those of marker
data.
Notes
- While you may load marker data with no trait data or vice versa, if your data set contains both kinds of data the marker data must come first.
- You can't mix marker lines from a regular (non-eQTL) .qdf file with those in an eQTL .qdf file. However, because you may represent unknown physical positions of markers by ?
and because all data in either kind of file can be subjected to all
ordinary QTL analyses, there is no harm in converting all of your
regular .qdf files to eQTL files by inserting this character and adding the eQTL header line.
- While it's convenient to prepare your .qdf data file in Microsoft Excel, (possibly using the utility script mentioned above) don't save it as an .xls file, which QGene won't be able to read! Save it as tab-separated text, .txt.
-
Very rarely QGene will refuse to open the QTL-analysis window for a data
file you have successfully loaded. This happens when the expected size
of some genotype class present in your data is smaller than 2, so that
QTL analysis would result in difficulty in estimating effects, leading
to confusing error messages. This will usually happen only with absurdly small data sets.
Other permitted formats
QGene 3.0 (Macintosh)
You can load files in the old QGene format.
In the Load data file dialog, select
both the .data and the
.map files after choosing the
corresponding file type from the Files of type: dropdown menu.
To make this multiple selection you'll need to hold down your Control (Windows, Unix) or Command (Macintosh) key.