Author: Davide Piffer
The aim of this study is to identify polygenic selection signatures on intelligence across 26 populations from 1000 Genomes. In the next post, I will expand on this to include more populations (at the expense of SNPs number and reliability)!
Derived allele frequencies and background calibration
At a theoretical level, an ancestral allele is the allele that was carried by the last common ancestor between humans and other primates whereas an allele is derived when it arose in the human lineage after the split from other primates. In practice, this allele is usually ascertained via comparison with chimpanzees. One limitation of this procedure is that if a mutation arose in chimpanzees after the split from humans, then the ancestral allele is not the chimp allele. Thus, 1000 Genomes infers ancestral alleles via alignment with 6 primate species (Ensembl, 2015).
Frequencies of derived alleles are not the same for all populations. Substantial DAF (derived allele frequency) differences across populations have been found, largely due to random drift and population bottlenecks but in part also shaped by different selection pressures (Henn et al., 2015). Non-African populations tend to have higher frequencies of derived alleles, and DAF is positively correlated to distance from Africa (Henn et al., 2015). There are also potential issues with GWAS. For example, a reviewer of a previous submission (https://topseudoscience.wordpress.com/2016/01/10/the-forbidden-paper-on-the-population-genetics-of-iq/) suggested that the minor alleles picked by the GWAS (carried on European subjects) tend to have higher frequencies among the GWAS reference population (i.e. Europeans) than the average genome-wide frequencies of minor alleles. Minor alleles are more likely to be derived alleles, hence these derived alleles will have higher frequencies among Europeans compared to other populations. If derived alleles tend to have a positive effect, the frequency of alleles with positive effect may be higher among Europeans than other populations.
A novel methodology suggested here to deal with this confound is to create a variable which represents a good approximation to the average frequencies of derived alleles picked up by GWA studies. For this purpose, the significant hits (N= 693) from the largest GWAS of human stature to date (Wood et al., 2014) were grouped by allele status. The average frequency of derived alleles (including both alleles with a positive and a negative effect) was computed and then averaged into a single variable, henceforth the DAF index (table 1). Negative and positive alleles were given equal weight to avoid positive selection bias on the index.
Table 1. Mean derived allele frequencies and country IQ.
|Gujarati Indian, Tx||0.365|
|Indian Telegu, UK||0.362|
|Mende, Sierra Leone||0.283||64|
|Mexican in L.A.||0.376||88|
|Sri Lankan, UK||0.362||79|
Using the DAF from the GWAS on human stature, we note that derived alleles (col. 2) tend to be at lower frequencies among African than non-African populations, confirming the findings of a recent study (Henn et al., 2015) on different mutational load at common variants. The hypothesis that this phenomenon could mediate the association between IQ and polygenic scores is also confirmed by DAF’s positive correlation with population IQ (r=0.767).Note that the confounding effect would be present only when there are more derived positive than ancestral positive. If these are represented in equal proportions, the overrepresentation of derived alleles in some populations will be perfectly balanced by the underrepresentation of ancestral alleles and viceversa. However, in cases where there is a dramatic overrepresentation of derived alleles (such as the top significant hits in Rietveld et al., 2013), it is necessary to control for background DAF. Moreover, having a larger sample of SNPs (such as that from the height GWAS comprising 693 SNPs) will enable us to have a more accurate estimate of the background DAF than that we could gain from using a smaller subset of SNPs.
A DAF-calibrated polygenic score is then created by subtracting the DAF index from the average frequency of derived alleles with positive effect from GWAS SNPs. Table 2 reports standardized scores, in descending order (sorted by the mean value of the two scores).
Note that we could also apply the reverse procedure and calculate a background frequency of ancestral alleles (1-DAF). Then one could subtract that from the average frequency of ancestral alleles with positive effect. This is perhaps justified for traits such as height which were not subject to a dramatic increase during human evolution. However, since intelligence has been subject to a sharp increase and most intelligence-enhancing mutations are likely to be human-specific and not shared with our primate ancestors, by focusing on derived alleles one likely amplifies the signal of selection.
Table 2. Background “DAF-free” polygenic scores (P.S). Average is reported as Z scores and reported in descending order.
|Population||P.S, Rietveld et al., 2014||P.S, p<5*10-8
|Gujarati Indian, Tx||-0.267||-0.135||-0.159||-0.187|
|Mende, Sierra Leone||0.133||-0.276||-0.453||-0.199|
|Indian Telegu, UK||-0.389||-1.558||-0.702||-0.883|
|Sri Lankan, UK||-0.230||-1.177||-1.293||-0.900|
|Mexican in L.A.||-2.045||-0.602||-0.508||-1.052|
The correlation between this score and that obtained using the raw frequencies (total polygenic score= derived and ancestral alleles with positive effect) is r=0.889. These are reported in table 3.
The calibrated scores are correlated to population IQ: r=0.462, 0.628 and 0.752 for the Rietveld et al., 2014, the GWAS significant and the other hits (p<5*10-7>=5*10-8), respectively.
The correlations between the mean calibrated and uncalibrated score and IQ are r=0.68 and 0.790, respectively.
Table 3. Total polygenic scores (Ancestral and derived alleles with positive effect), reported in descending order.
|Population||Rietveld et al 2014; N=67||p<5*10-8; N=10||p<5*10-7>=5*10-8; N=49||Average|
|Gujarati Indian, Tx||0.449||0.403||0.493||0.448|
|Mexican in L.A.||0.431||0.370||0.515||0.439|
|Mende, Sierra Leone||0.458||0.355||0.462||0.425|
|Sri Lankan, UK||0.458||0.323||0.445||0.409|
|Indian Telegu, UK||0.451||0.293||0.457||0.400|
We can apply the reverse procedure to determine if ancestral alleles contain signal above and beyond the background AAF (ancestral allele frequency) distribution. We can carry this out using the Rietveld et al., 2014, the Rietveld et al., 2013 hits with p<5*10-7>=5*10-8, but it is not possible to use the top 10 SNPs because they contain only 1 ancestral allele with positive effect. Table 9 reports the difference between AP for Rietveld et al., 2014 and 2013 and the background AAF (AP-AAF), and population IQ.
Table 4. Ancestral alleles with positive effect – AAF.
|Population||AP-AAF; Rietveld et al., 2014||AP-AAF; Rietveld et al., 2013 (p<5*10-7>=5*10-8)||IQ|
|Gujarati Indian, Tx||-0.069||-0.051|
|Indian Telegu, UK||-0.061||-0.096|
|Mende, Sierra Leone||0.005||-0.098||64|
|Mexican in L.A.||-0.097||0.011||88|
|Sri Lankan, UK||-0.043||-0.092||79|
The correlation between AAP-AAF (Rietveld et al, 2014) and IQ is negative: r=-0.472. The correlation between AAP-AAF (Rietveld et al, 2013) and IQ is positive: r= 0.742.
Controlling for different population DAFs does not substantially alter the overall pattern, although there is a slight reduction in fit (r x population IQ drops from 0.79 to 0.68), which we do not know if it is just a fluke. The far from perfect correlation with population IQ is due to the top place occupied by Europeans instead of East Asians and a tendency for Latin Americans and South Asians (Indians, Bangladeshi) to score as low as sub-Saharan Africans. We also notice that ancestral positive alleles do not have as strong a correlation to population IQ (r= -0.472 and 0.742) as derived positive alleles (table 4). This is expected on evolutionary grounds, as selection on intelligence should have acted on human-specific mutations rather than on ancestral variants shared with non-human primates.
Davies, G., Armstrong, N., Bis, J. C., et al. (2015). Genetic contributions to variation in general cognitive function: a meta-analysis of genome-wide association studies in the CHARGE consortium (N=53949).
Henn, B.M., Botigué, L.R., Peischl, S., Dupanloup,I., Lipatov,M., Maples,B.K., Martin, A.R., Musharoff, S., Cann, H., Snyder,M.P., Excoffier, L., Kidd, J.M., Bustamante, C.D. (2015). Distance from sub-Saharan Africa predicts mutational load in diverse human genomes. PNAS ; published ahead of print December 28, 2015, doi:10.1073/pnas.1510805112
Rietveld, C.A., Medland, S.E., Derringer, J., Yang, J., Esko, T., Martin, N.W., et al. (2013). GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science, 340, 1467-1471. doi: http://doi.org/10.1126/science.1235488
Rietveld, C.A., Esko, T., Davies, G., Pers, T.H., Turley, P., Benyamin, B., et al. (2014). Common genetic variants associated with cognitive performance identified using the proxy-phenotype method. Proceedings of the National Academy of Sciences, USA, 111, 13790-13794. doi:10.1073/pnas.1404623111
Wood AR, Esko T, Yang J,et al.: Defining the role of common variation in the genomic and biological architecture of adult human height. Nat Genet. 2014; 46(11): 1173–86.