Okbay et al. (2016) reported 162 independent SNPs that reached genome-wide significance (P < 5*10-8) in the pooled-sex EduYears meta-analysis of the discovery and replication samples (N =405,072). 161 SNPs were found in 1000 Genomes. These were divided into 32 subsets of 5 SNPs and factor analyzed. The correlations of factor loadings and corr x pop IQ with p value were r= -0.273 and -0.008, respectively. Moreover, the two vectors (factor loadings and corr x pop IQ) were intercorrelated (r= 0.223), implying that the internal coherence of the factors is correlated to their predictive validity.
The scatterplot is shown in figure 1.
The top 4 significant SNPs sets (N=20) were used to compute a polygenic score and the 4 factor scores were averaged. These were chosen because they had the highest loadings, highest correlation to population IQ and lowest p value (respectively, 0.383 and 0.83, compared to an average of 0.22 and 0.11 for the entire dataset), hence suggesting more signal in the data.
The largest GWAS to date (Wood et al., 2016) identified 697 SNPs which reached statistical significance for their association with human height. Factor analysis was carried out on 69 sets of 10 SNPs.
The top 10 significant SNPs for height were chosen because they had a higher average factor loading (0.419) than the entire set (0.166), actually the third highest among 69 sets of 10 SNPs. Polygenic and factor scores are reported in table 1. The latter are also reported in table 2 and 3, in descending order.
Table 1. Factor and polygenic scores. Top significant SNPs for height and educational attainment (IQ) GWAS.
|Gujarati Indian, Tx||0.386||-0.059||0.524||-0.333|
|Indian Telegu, UK||0.372||-0.127||0.521||-0.475|
|Mende, Sierra Leone||0.332||-1.475||0.624||1.278|
|Mexican in L.A.||0.36||0.143||0.502||-0.561|
|Sri Lankan, UK||0.373||0.025||0.5||-0.576|
Table 2. IQ factor scores sorted in descending order.
|Mexican in L.A.||0.143|
|Sri Lankan, UK||0.025|
|Gujarati Indian, Tx||-0.059|
|Indian Telegu, UK||-0.127|
|Mende, Sierra Leone||-1.475|
Table 3. Height factor scores in descending order
|Mende, Sierra Leone||1.278|
|Gujarati Indian, Tx||-0.333|
|Indian Telegu, UK||-0.475|
|Mexican in L.A.||-0.561|
|Sri Lankan, UK||-0.576|
There is a strong negative correlation between height and intelligence factor scores (r=-0.778).
The correlation between population IQ estimates (Piffer, 2015) with the average factor score and the polygenic score were r=0.923 and 0.867. The very high correlation of the factor score exceeds the 99% C.I. produced with a simulation using 200 iterations on random SNPs.
East Asians top the IQ rankings but are at the bottom of the height rankings. The opposite is true of African populations. Europeans have mid-high scores for both IQ and height, whereas South Asians and Hispanics/Latinos have mid to low scores on both traits.
The higher internal (i.e. factor loadings) and external (i.e. corr x IQ) coherence of factors extracted from more significant SNPs and the different patterns observed for height and IQ suggest that these SNPs represent signal of polygenic selection and not merely phylogenetic autocorrelation. Another important finding is that the signal is restricted to the most significant hits of each GWAS.
The individual scores are dependent on the choice of SNPs and the computational method (e.g. polygenic vs factor scores) but the overall pattern isn’t affected, since it is pretty consistent across GWAS samples and publications.
Okbay, A., Beauchamp, J.P., Fontana, M.A., Lee, J., Pers, T.H., et al. (2016). Genome-wide association study identifies 74 loci associated with educational attainment. Nature, doi:10.1038/nature17671
Piffer, D. (2015). A review of intelligence GWAS hits: Their relationship to country IQ and the issue of spatial autocorrelation. Intelligence, 53, 43-50.
Wood AR, Esko T, Yang J, et al.: Defining the role of common variation in the genomic and biological architecture of adult human height. Nat Genet. 2014; 46(11): 1173–86