The laboratory domestication of zebrafish: from diverse populations to inbred substrains
The zebrafish (Danio rerio) is a model vertebrate widely used to study disease, development and other aspects of vertebrate biology. Most of the research is performed on laboratory strains, one of which has been fully sequenced in order to derive a reference genome. It is known that the laboratory strains differ genetically from each other, but so far no genome-scale survey of variation between the laboratory and wild zebrafish populations exists. Here we use Restriction-Associated DNA sequencing (RAD-seq) to characterize three different wild zebrafish lineages from a population genetic viewpoint, and to compare them to four common laboratory strains. For this purpose we combine new genome-wide sequence data obtained from natural samples in India, Nepal and Bangladesh with a previously published dataset. We measured nucleotide diversity, heterozygosity, allele frequency spectra and patterns of gene conversion, and find that wild fish are much more diverse than laboratory strains. Further, in wild zebrafish there is a clear signal of GC-biased gene conversion that is missing in laboratory strains. We also find that zebrafish populations in Nepal and Bangladesh are distinct from all the other strains studied, making them an attractive subject for future studies of zebrafish population genetics and molecular ecology. Finally, isolates of the same strains kept in different laboratories show a clear pattern of ongoing differentiation into genetically distinct substrains. Together, our findings broaden the basis for future genetic and evolutionary studies in Danio rerio.