Email: pifferdavide@gmail.com
I have recently updated the new version of my paper about polygenic selection pressures on human stature published in f1000research. I chose stature not because it’s a particularly interesting trait but for the simple reason that it’s very straightforward to measure and has the largest sample size available for genome-wide association studies. Its genetic architecture is also very similar to IQ because it’s highly polygenic and normally distributed.
As far as I know, f1000research is the only other journal in the world to be “twice open” : open access and open peer review. The journal I founded (OpenPsych.net) is twice open but also free and is more interactive, besides being based on a bottom up process in the sense that reviewers choose the paper instead of the editor choosing reviewers. Apart from this, let’s come to my study.
The biggest novelty is a correction I have introduced to deal with different population frequencies of derived alleles. Derived alleles are basically human-specific mutations that are assumed to have arisen after the chimp/homo lineages split. Of course these are not the only mutations that arose during human evolution. Remember that we are talking about polymorphisms, hence this automatically excludes all mutations that are fixed in the human population (no polymorphism, no SNP). The latter are substitutions ascertained via comparison with the chimp genome. Fixed mutations were once polymorphisms (a jargon term for SNP, which is even more alien for some people), but not all SNPs became fixed as some were lost due to random drift or purifying selection (the process that eliminates deleterious alleles).
There is a big controversy going on as to the causes of these: are they the result of relaxed puryfing selection due to population bottlenecks and decreased effective population size? (Henn et al, 2015) Or are they a result of increased mutation rate after a bottleneck? (Do et al., 2015) Were all (or almost all) mutations deleterious or were many of them adaptive? (Harris, 2010).
Besides demographic histories, there is also the problem that GWAS are usually carried out on Europeans, hence they tend to pick up derived alleles at higher frequency among European populations.
Be it as it may, I had to find ways to correct for this bias. In the case of the height GWAS (Wood, 2014), this was rather straightforward. There were 697 SNPs reaching genome-wide significance so this is a pretty big sample but 691 could be aligned for ancestral/derived status using 1000 Genomes. Among the positive effect alleles, there were slight more of the derived kind (370:321). Hence I computed two polygenic scores (mean population frequencies): ancestral and derived. Then I created a composite score by averaging them. This gives equal weight to ancestral and derived alleles (Piffer, 2015b).The end result is that populations with higher baseline frequencies of ancestral alleles (such as Africans) obtain a higher score after this correction, because more weight is given to ancestral alleles.
A corrected score of IQ increasing derived alleles was also computed and averaged across the four polygenic scores (two from Rietveld et al., 2013; one from Rietveld et al., 2014 and one from Davies et al., 2015), affecting educational attainment or fluid intelligence.
Table 1. Polygenic scores.
Corrected Height | Uncorrected Height | Corrected IQ | Uncorrected IQ | |
Afr.Car.Barbados | 0.487 | 0.473 | -0.009 | 0.374 |
US Blacks | 0.490 | 0.476 | 0.018 | 0.400 |
Bengali Bangladesh | 0.485 | 0.476 | 0.002 | 0.406 |
Chinese Dai | 0.479 | 0.469 | 0.078 | 0.484 |
Utah Whites | 0.511 | 0.503 | 0.102 | 0.511 |
Chinese, Bejing | 0.479 | 0.470 | 0.087 | 0.501 |
Chinese, South | 0.482 | 0.472 | 0.075 | 0.483 |
Colombian | 0.493 | 0.484 | 0.062 | 0.478 |
Esan, Nigeria | 0.485 | 0.470 | 0.011 | 0.386 |
Finland | 0.505 | 0.497 | 0.122 | 0.531 |
British, GB | 0.508 | 0.499 | 0.114 | 0.524 |
Gujarati Indian, Tx | 0.486 | 0.476 | 0.031 | 0.434 |
Gambian | 0.486 | 0.471 | -0.001 | 0.375 |
Iberian, Spain | 0.500 | 0.491 | 0.121 | 0.534 |
Indian Telegu, UK | 0.488 | 0.478 | -0.032 | 0.370 |
Japan | 0.477 | 0.468 | 0.057 | 0.463 |
Vietnam | 0.480 | 0.470 | 0.105 | 0.507 |
Luhya, Kenya | 0.483 | 0.468 | -0.014 | 0.358 |
Mende, Sierra Leone | 0.487 | 0.472 | 0.026 | 0.396 |
Mexican in L.A. | 0.488 | 0.479 | 0.004 | 0.418 |
Peruvian, Lima | 0.484 | 0.475 | -0.043 | 0.378 |
Punjabi, Pakistan | 0.491 | 0.482 | -0.004 | 0.406 |
Puerto Rican | 0.493 | 0.484 | 0.066 | 0.482 |
Sri Lankan, UK | 0.487 | 0.478 | -0.024 | 0.384 |
Toscani, Italy | 0.501 | 0.492 | 0.128 | 0.537 |
Yoruba, Nigeria | 0.484 | 0.469 | 0.012 | 0.384 |
The correlation between the uncorrected scores (0.602) is slightly higher than between the corrected scores (0.487).
The scores were ranked in descending order and reported in table 2.
Table 2. Corrected polygenic scores reported in descending order.
Corrected Height | Corrected IQ | ||
Utah Whites | 0.511 | Toscani, Italy | 0.128 |
British, GB | 0.508 | Finland | 0.122 |
Finland | 0.505 | Iberian, Spain | 0.121 |
Toscani, Italy | 0.501 | British, GB | 0.114 |
Iberian, Spain | 0.500 | Vietnam | 0.105 |
Puerto Rican | 0.493 | Utah Whites | 0.102 |
Colombian | 0.493 | Chinese, Bejing | 0.087 |
Punjabi, Pakistan | 0.491 | Chinese Dai | 0.078 |
US Blacks | 0.490 | Chinese, South | 0.075 |
Mexican in L.A. | 0.488 | Puerto Rican | 0.066 |
Indian Telegu, UK | 0.488 | Colombian | 0.062 |
Sri Lankan, UK | 0.487 | Japan | 0.057 |
Afr.Car.Barbados | 0.487 | Gujarati Indian, Tx | 0.031 |
Mende, Sierra Leone | 0.487 | Mende, Sierra Leone | 0.026 |
Gujarati Indian, Tx | 0.486 | US Blacks | 0.018 |
Gambian | 0.486 | Yoruba, Nigeria | 0.012 |
Bengali Bangladesh | 0.485 | Esan, Nigeria | 0.011 |
Esan, Nigeria | 0.485 | Mexican in L.A. | 0.004 |
Yoruba, Nigeria | 0.484 | Bengali Bangladesh | 0.002 |
Peruvian, Lima | 0.484 | Gambian | -0.001 |
Luhya, Kenya | 0.483 | Punjabi, Pakistan | -0.004 |
Chinese, South | 0.482 | Afr.Car.Barbados | -0.009 |
Vietnam | 0.480 | Luhya, Kenya | -0.014 |
Chinese, Bejing | 0.479 | Sri Lankan, UK | -0.024 |
Chinese Dai | 0.479 | Indian Telegu, UK | -0.032 |
Japan | 0.477 | Peruvian, Lima | -0.043 |
We can see that the ranking of corrected polygenic scores for height and IQ gives higher scores to Africans compared to the uncorrected scores, as predicted on the basis of their lower background derived frequencies. The bottom place for height is occupied by East Asian populations (Japan, Chinese, Vietnamese), and the top place by North Europeans (White Americans, Finns, British) matching anthropometric descriptions and available statistics (https://en.wikipedia.org/wiki/Human_height). The bottom places of the IQ polygenic scores are occupied by South American, South Asian and African populations. It must be noted that the South Asian populations (Indian Telegu, Sri Lankan) are living in the UK and I am not aware of the existence of any reliable studies on their average IQ.
These results are encouraging because they provide discriminant validity (only a moderate correlation between the height and IQ polygenic scores, which can be explained by phylogenetic autocorrelation) and predictive validity (a moderately good fit with phenotypic population averages (IQ and height). A less than perfect fit is expected given that we have not sampled all the SNPs, that these represent only signals of polygenic pressure (thus not including all the non-additive effects) and the importance of environment for these variables, as showed from the dramatic secular trend in height and IQ observed within Western countries.
A Piffer-Mantel test (Piffer, 2015) was carried out by calculating the distances between all pairs of populations for the polygenic scores. The height polygenic score was used as the dependent variable and Fst distances + the IQ score as the independent variables.
There was a slight positive Beta coefficient for the IQ PS (0.387) but Fst was close to 0 (0.06) (Piffer, in press). The average value obtained using 100 polygenic scores from the SNPs (2+ millions) contained in Rietveld et al. (including the non-significant ones) is 0.06 with SD=0.176.if we assume that the tiny deviation from 0 (0.06) was a result of chance or residual signal contained in some of the Rietveld hits, we can calculate the deviation from null expectations: 0.387/0.176= 2.19 Zs.
A partial correlation (height ps, IQ ps, Fst) gave almost identical result (r=0.386).
To confirm that this is a sign of common selection pressures we’ll need more population samples but this is still a suggestive finding.
Conclusion
This article shows that it’s necessary to control for background frequencies of derived and ancestral alleles when computing population-level polygenic scores.
References:
Davies, G., Armstrong, N., Bis, J. C., et al. (2015). Genetic contributions to variation in general cognitive function: a meta-analysis of genome-wide association studies in the CHARGE consortium (N=53949).Molecular Psychiatry, 20:183-192. doi: 10.1038/mp.2014.188
Do, R., Balick, B., Li, H., Adzhubei, I., Sunyaev, S., & Reich, D. (2015). No evidence that selection has been less effective at removing mutations in Europeans than Africans. Nature Genetics, doi:10.1038/ng.3186
Harris, E.E. (2010). Nonadaptive processes in primate and human evolution. Yearbook of Physical Anthropology, 53: 13-45.
Henn, B.M., Botigué, L.R., Peischl, S., Dupanloup,I., Lipatov,M., Maples,B.K., Martin, A.R., Musharoff, S., Cann, H., Snyder,M.P., Excoffier, L., Kidd, J.M., Bustamante, C.D. (2015). Distance from sub-Saharan Africa predicts mutational load in diverse human genomes. PNAS ; published ahead of print December 28, 2015, doi:10.1073/pnas.1510805112
Piffer, D. (2015a). A review of intelligence GWAS hits: Their relationship to country IQ and the issue of spatial autocorrelation. Intelligence, 53, 43-50.
Piffer D. (2015b). Evidence of polygenic selection on human stature inferred from spatial distribution of allele frequencies. F1000Research, 4:15
Piffer, in press. Polygenic selection of cognitive ability: polygenic scores predict average group intelligence. Is selection signal a function of GWAS significance?
Rietveld, C.A., Medland, S.E., Derringer, J., Yang, J., Esko, T., Martin, N.W., et al. (2013). GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science, 340, 1467-1471. doi: http://doi.org/10.1126/science.1235488
Rietveld, C.A., Esko, T., Davies, G., Pers, T.H., Turley, P., Benyamin, B., et al. (2014). Common genetic variants associated with cognitive performance identified using the proxy-phenotype method. Proceedings of the National Academy of Sciences, USA, 111, 13790-13794. doi:10.1073/pnas.1404623111
Wood AR, Esko T, Yang J,et al.: Defining the role of common variation in the genomic and biological architecture of adult human height. Nat Genet. 2014; 46(11): 1173–86.