Using derived alleles to amplify selection signatures on intelligence

Author: Davide Piffer

The aim of this study is to identify polygenic selection signatures on intelligence across 26 populations from 1000 Genomes. In the next post, I will expand on this to include more populations (at the expense of SNPs number and reliability)!

Derived allele frequencies and background calibration

At a theoretical level, an ancestral allele is the allele that was carried by the last common ancestor between humans and other primates whereas an allele is derived when it arose in the human lineage after the split from other primates. In practice, this allele is usually ascertained via comparison with chimpanzees. One limitation of this procedure is that if a mutation arose in chimpanzees after the split from humans, then the ancestral allele is not the chimp allele. Thus, 1000 Genomes infers ancestral alleles via alignment with 6 primate species (Ensembl, 2015).

Frequencies of derived alleles are not the same for all populations. Substantial DAF (derived allele frequency) differences across populations have been found, largely due to random drift and population bottlenecks but in part also shaped by different selection pressures (Henn et al., 2015). Non-African populations tend to have higher frequencies of derived alleles, and DAF is positively correlated to distance from Africa (Henn et al., 2015). There are also potential issues with GWAS. For example, a reviewer of a previous submission (https://topseudoscience.wordpress.com/2016/01/10/the-forbidden-paper-on-the-population-genetics-of-iq/) suggested that the minor alleles picked by the GWAS (carried on European subjects) tend to have higher frequencies among the GWAS reference population (i.e. Europeans) than the average genome-wide frequencies of minor alleles. Minor alleles are more likely to be derived alleles, hence these derived alleles will have higher frequencies among Europeans compared to other populations. If derived alleles tend to have a positive effect, the frequency of alleles with positive effect may be higher among Europeans than other populations.

A novel methodology suggested here to deal with this confound is to create a variable which represents a good approximation to the average frequencies of derived alleles picked up by GWA studies. For this purpose, the significant hits (N= 693) from the largest GWAS of human stature to date (Wood et al., 2014) were grouped by allele status. The average frequency of derived alleles (including both alleles with a positive and a negative effect) was computed and then averaged into a single variable, henceforth the DAF index (table 1). Negative and positive alleles were given equal weight to avoid positive selection bias on the index.

Table 1. Mean derived allele frequencies and country IQ.

Population Height Derived IQ
Afr.Car.Barbados 0.298 83
US Blacks 0.309 85
Bengali Bangladesh 0.363 81
Chinese Dai 0.359
Utah Whites 0.382 99
Chinese, Bejing 0.365 105
Chinese, South 0.362 105
Colombian 0.372 83.5
Esan, Nigeria 0.286 71
Finland 0.385 101
British, GB 0.381 100
Gujarati Indian, Tx 0.365
Gambian 0.291 62
Iberian, Spain 0.378 97
Indian Telegu, UK 0.362
Japan 0.366 105
Vietnam 0.360 99.4
Luhya, Kenya 0.291 74
Mende, Sierra Leone 0.283 64
Mexican in L.A. 0.376 88
Peruvian, Lima 0.373 85
Punjabi, Pakistan 0.366 84
Puerto Rican 0.369 83.5
Sri Lankan, UK 0.362 79
Toscani, Italy 0.376 99
Yoruba, Nigeria 0.285 71

Using the DAF from the GWAS on human stature, we note that derived alleles (col.  2) tend to be at lower frequencies among African than non-African populations, confirming the findings of a recent study (Henn et al., 2015) on different mutational load at common variants. The hypothesis that this phenomenon could mediate the association between IQ and polygenic scores is also confirmed by DAF’s positive correlation with population IQ (r=0.767).Note that the confounding effect would be present only when there are more derived positive than ancestral positive. If these are represented in equal proportions, the overrepresentation of derived alleles in some populations will be perfectly balanced by the underrepresentation of ancestral alleles and viceversa. However, in cases where there is a dramatic overrepresentation of derived alleles (such as the top significant hits in Rietveld et al., 2013), it is necessary to control for background DAF. Moreover, having a larger sample of SNPs (such as that from the height GWAS comprising 693 SNPs) will enable us to have a more accurate estimate of the background DAF than that we could gain from using a smaller subset of SNPs.

A DAF-calibrated polygenic score is then created by subtracting the DAF index from the average frequency of derived alleles with positive effect from GWAS SNPs. Table 2 reports standardized scores, in descending order (sorted by the mean value of the two scores).

Note that we could also apply the reverse procedure and calculate a background frequency of ancestral alleles (1-DAF). Then one could subtract that from the average frequency of ancestral alleles with positive effect. This is perhaps justified for traits such as height which were not subject to a dramatic increase during human evolution. However, since intelligence has been subject to a sharp increase and most intelligence-enhancing mutations are likely to be human-specific and not shared with our primate ancestors, by focusing on derived alleles one likely amplifies the signal of selection.

Table 2. Background “DAF-free” polygenic scores (P.S). Average is reported as Z scores and reported in descending order.

Population P.S, Rietveld et al., 2014 P.S, p<5*10-8 

(N=9)

P.S,p<5*10-7>=5*10-8 

(N=49)

Average
Toscani, Italy 1.671 1.620 1.496 1.596
Iberian, Spain 1.567 1.646 1.391 1.535
Finland 1.358 1.645 1.113 1.372
British, GB 0.886 1.446 1.397 1.243
Vietnam 0.481 0.798 1.679 0.986
Japan 1.667 -0.230 1.124 0.854
Utah Whites 0.239 1.319 0.908 0.822
Chinese, Bejing 0.462 0.536 0.736 0.578
Chinese, South 0.494 0.221 0.893 0.536
Chinese Dai -0.229 0.485 0.414 0.223
Gujarati Indian, Tx -0.267 -0.135 -0.159 -0.187
Mende, Sierra Leone 0.133 -0.276 -0.453 -0.199
Colombian -0.847 0.672 -0.433 -0.202
Yoruba, Nigeria 0.309 -0.456 -0.551 -0.233
Puerto Rican -1.178 0.683 -0.220 -0.239
US Blacks -0.245 -0.353 -0.600 -0.399
Gambian 0.233 -0.770 -0.709 -0.415
Afr.Car.Barbados 0.187 -0.931 -0.922 -0.555
Esan, Nigeria -0.626 -0.444 -0.746 -0.605
Punjabi, Pakistan -0.760 -0.928 -0.164 -0.618
Bengali Bangladesh 0.262 -0.646 -1.532 -0.639
Luhya, Kenya -0.947 -1.044 -0.356 -0.782
Indian Telegu, UK -0.389 -1.558 -0.702 -0.883
Sri Lankan, UK -0.230 -1.177 -1.293 -0.900
Mexican in L.A. -2.045 -0.602 -0.508 -1.052
Peruvian, Lima -2.187 -1.523 -1.804 -1.838

The correlation between this score and that obtained using the raw frequencies (total polygenic score= derived and ancestral alleles with positive effect) is r=0.889. These are reported in table 3.

The calibrated scores are correlated to population IQ: r=0.462, 0.628 and 0.752 for the Rietveld et al., 2014, the GWAS significant and the other hits (p<5*10-7>=5*10-8), respectively.

The correlations between the mean calibrated and uncalibrated score and IQ are r=0.68 and 0.790, respectively.

Table 3. Total polygenic scores (Ancestral and derived alleles with positive effect), reported in descending order.

Population Rietveld et al 2014; N=67 p<5*10-8; N=10 p<5*10-7>=5*10-8; N=49 Average
Iberian, Spain 0.468 0.566 0.569 0.534
Toscani, Italy 0.467 0.562 0.568 0.532
Finland 0.465 0.573 0.530 0.523
British, GB 0.458 0.548 0.560 0.522
Utah Whites 0.459 0.534 0.530 0.507
Vietnam 0.459 0.491 0.565 0.505
Chinese, Bejing 0.471 0.468 0.555 0.498
Chinese, South 0.466 0.448 0.543 0.485
Puerto Rican 0.449 0.483 0.520 0.484
Colombian 0.445 0.476 0.519 0.480
Chinese Dai 0.454 0.463 0.520 0.479
Japan 0.474 0.399 0.554 0.476
Gujarati Indian, Tx 0.449 0.403 0.493 0.448
Mexican in L.A. 0.431 0.370 0.515 0.439
Punjabi, Pakistan 0.453 0.357 0.490 0.433
US Blacks 0.451 0.360 0.468 0.426
Mende, Sierra Leone 0.458 0.355 0.462 0.425
Yoruba, Nigeria 0.458 0.340 0.468 0.422
Esan, Nigeria 0.455 0.341 0.461 0.419
Bengali Bangladesh 0.450 0.368 0.435 0.418
Gambian 0.456 0.325 0.456 0.412
Afr.Car.Barbados 0.459 0.317 0.460 0.412
Sri Lankan, UK 0.458 0.323 0.445 0.409
Peruvian, Lima 0.427 0.288 0.498 0.404
Luhya, Kenya 0.450 0.292 0.463 0.402
Indian Telegu, UK 0.451 0.293 0.457 0.400

We can apply the reverse procedure to determine if ancestral alleles contain signal above and beyond the background AAF (ancestral allele frequency) distribution. We can carry this out using the Rietveld et al., 2014, the Rietveld et al., 2013 hits with p<5*10-7>=5*10-8, but it is not possible to use the top 10 SNPs because they contain only 1 ancestral allele with positive effect. Table 9 reports the difference between AP for Rietveld et al., 2014 and 2013 and the background AAF (AP-AAF), and population IQ.

Table 4. Ancestral alleles with positive effect – AAF.

 

Population AP-AAF; Rietveld et al., 2014 AP-AAF; Rietveld et al., 2013 (p<5*10-7>=5*10-8) IQ
Afr.Car.Barbados -0.003 -0.079 83
US Blacks -0.025 -0.079 85
Bengali Bangladesh -0.074 -0.105 81
Chinese Dai -0.052 -0.025
Utah Whites -0.062 -0.030 99
Chinese, Bejing -0.022 0.030 105
Chinese, South -0.035 0.000 105
Colombian -0.075 0.012 83.5
Esan, Nigeria 0.007 -0.087 71
Finland -0.067 -0.039 101
British, GB -0.075 0.008 100
Gujarati Indian, Tx -0.069 -0.051
Gambian -0.009 -0.096 62
Iberian, Spain -0.058 0.026 97
Indian Telegu, UK -0.061 -0.096
Japan -0.034 0.012 105
Vietnam -0.051 0.008 99.4
Luhya, Kenya -0.005 -0.099 74
Mende, Sierra Leone 0.005 -0.098 64
Mexican in L.A. -0.097 0.011 88
Peruvian, Lima -0.104 0.038 85
Punjabi, Pakistan -0.052 -0.055 84
Puerto Rican -0.056 0.006 83.5
Sri Lankan, UK -0.043 -0.092 79
Toscani, Italy -0.061 0.019 99
Yoruba, Nigeria 0.000 -0.079 71

The correlation between AAP-AAF (Rietveld et al, 2014) and IQ is negative: r=-0.472. The correlation between AAP-AAF (Rietveld et al, 2013) and IQ is positive: r= 0.742.

Conclusions

Controlling for different population DAFs does not substantially alter the overall pattern, although there is a slight reduction in fit (r x population IQ drops from 0.79 to 0.68), which we do not know if it is just a fluke. The far from perfect correlation with population IQ is due to the top place occupied by Europeans instead of East Asians and a tendency for Latin Americans and South Asians (Indians, Bangladeshi) to score as low as sub-Saharan Africans. We also notice that ancestral positive alleles do not have as strong a correlation to population IQ (r= -0.472 and 0.742) as derived positive alleles (table 4). This is expected on evolutionary grounds, as selection on intelligence should have acted on human-specific mutations rather than on ancestral variants shared with non-human primates.

References:

Ensembl, 2015: http://www.1000genomes.org/faq/where-does-ancestral-allele-information-your-variants-come

Davies, G., Armstrong, N., Bis, J. C., et al. (2015). Genetic contributions to variation in general cognitive function: a meta-analysis of genome-wide association studies in the CHARGE consortium (N=53949).

Henn, B.M., Botigué, L.R., Peischl, S., Dupanloup,I.,  Lipatov,M., Maples,B.K., Martin, A.R., Musharoff, S., Cann, H., Snyder,M.P., Excoffier, L., Kidd, J.M.,  Bustamante, C.D. (2015). Distance from sub-Saharan Africa predicts mutational load in diverse human genomes. PNAS ; published ahead of print December 28, 2015, doi:10.1073/pnas.1510805112

Rietveld, C.A., Medland, S.E., Derringer, J., Yang, J., Esko, T., Martin, N.W., et al. (2013). GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science, 340, 1467-1471. doi: http://doi.org/10.1126/science.1235488

Rietveld, C.A., Esko, T., Davies, G., Pers, T.H., Turley, P., Benyamin, B., et al. (2014). Common genetic variants associated with cognitive performance identified using the proxy-phenotype method. Proceedings of the National Academy of Sciences, USA, 111, 13790-13794. doi:10.1073/pnas.1404623111

Wood AR, Esko T, Yang J,et al.: Defining the role of common variation in the genomic and biological architecture of adult human height. Nat Genet. 2014; 46(11): 1173–86.