The forbidden paper on the population genetics of IQ

Author: Davide Piffer

I submitted a paper to Intelligence in December, 2015. After about three weeks, I received a rejection letter from the new editor (Richard Haier). What was particularly irritating about one of the reviewers was the recommendation to reject without opportunity for revision. In my opinion, this stance is justified only in extreme instances of fatal flaws, otherwise it just reveals a hidden agenda or a general close-minded attitude. My policy has been for some time to post the reviews of rejected papers, because I do not believe that reviews should be hidden. Transparency is very important, particularly in science. Let the general public decide whose arguments provide a better fit to the data, not the dismissive attitude of a reviewer. The reviews are attached in the appendix and the paper can be downloaded from here

This review was obviously written by an expert in the field, although it is not devoid of some generic comments that are irritating because they leave the question that they raise unanswered and they do not provide any references to back up their claims (e.g. “one research group with control of an extremely large family cohort is currently working on a manuscript documenting that years of education is subject to a very peculiar form of confounding”). Really? Which “peculiar form of confounding”? Which large family cohort and which manuscript? Which research group?

Isn’t it funny how a reviewer can afford to be generic and not provide any justification or references to back up their claims, but the authors have got to take extreme pains to make sure that everything is backed by sound evidence? Why this double standard?Perhaps because reviewers work for free, and nobody likes to do unpaid work.

There are a few serious comments that deserve consideration. For example: “A GWAS of Europeans is more likely to detect SNPs with high minor allele frequencies.The minor allele is usually the derived allele, and thus the use of SNPs ascertained to have low p-values in a GWAS of Europeans will lead to an overrepresentation of SNPs with high derived allele frequencies specifically in Europeans. If the derived allele tends to have a positive effect (as the authors claim), this is certainly an issue that needs to be carefully addressed.”

This argument is not explained very clearly. It’s another example of how the reviewers expect crystal-clear clarity from the authors, but they can get away with making rather obscure comments that leave room for different interpretations.

This could mean two things. 1) That the GWAS tends to select trait increasing alleles that are derived.However, upon closer inspection it turns out to be fallacious. There is the wrong assumption that the GWAS hits always have a positive beta, which is not the case. Positive and negative betas are randomly distributed across GWAS hits. Thus, when the GWAS selects the hit with a negative beta, which should be more likely to be the minor allele and hence derived, the allele with a positive beta (in this case, IQ enhancing) is going to be more likely to be the major allele and hence ancestral.

Nonetheless, I counted the number of derived alleles among the alleles increasing height in the latest and biggest GWAS meta-analysis of variation human stature. This would give us an estimate of the GWAS bias towards picking derived alleles

The derived to total allele count ratio is 370/691 or 53.5%. Assessing the statistical significance this result is problematic because many SNPs are in linkage disequilibrium and violate the assumption that they represent independent observations. It’s likely that this is just a statistical fluke but nonetheless, we can give the reviewer’s fallacious reasoning the benefit of the doubt.

The derived to total allele count ratio for intelligence enhancing alleles is 42/66 or 63.6%. Good news is that here we can apply binomial probability to calculate statistical significance because the SNPs were pruned for LD by the authors (Rietveld et al., 2014). We can see that the probability p that X(the number of derived alleles)>=42 is 0.0179.

However, to be fair we have got to include the knowledge acquired by the height GWAS and assume that there is a bias for derived alleles in the GWAS results. The best estimate of this bias is equal to the percentage of derived alleles in excess of 50% in the height meta-analysis, that is 3.5%.

A binomial calculation assuming a background frequency of 53.5% will yield a p value of 0.062, which is not extremely strong but not too shabby either.

However, the more likely interpretation of the reviewer’s comment is that the minor alleles picked by the GWAS tend to have higher frequencies among the GWAS reference population (i.e. Europeans) than the average genome-wide frequencies of minor alleles. Minor alleles are more likely to be derived alleles, hence these derived alleles will have higher frequencies among Europeans compared to other populations. Since derived alleles tend to have a positive effect, the frequency of alleles with positive effect will tend to be higher among Europeans than other populations. It was hard work translating the  reviewers’ obscure words into an understandable sentence.

We can again give the reviewer benefit of the doubt and see if derived alleles with a positive effect have higher frequencies among Europeans compared to ancestral alleles with a positive effect and if their average frequencies are still correlated to population IQs.  

Table 1. Top 69 cognitive performance significant SNPs in Rietveld et al. (2014).

 

Population Derived positive Derived Negative Total PS IQ DP-DN
Afr.Car.Barbados 0.317 0.301 0.459 83 0.015
US Blacks 0.324 0.334 0.451 85 -0.010
Bengali Bangladesh 0.383 0.437 0.450 81 -0.054
Chinese Dai 0.373 0.411 0.454 -0.037
Utah Whites 0.401 0.444 0.459 99 -0.043
Chinese, Bejing 0.387 0.388 0.471 105 -0.001
Chinese, South 0.384 0.398 0.465 105 -0.014
Colombian 0.381 0.447 0.445 83.5 -0.066
Esan, Nigeria 0.297 0.279 0.455 71 0.018
Finland 0.415 0.451 0.465 101 -0.037
British, GB 0.406 0.455 0.458 100 -0.049
Gujarati Indian, Tx 0.380 0.435 0.449 -0.055
Gambian 0.310 0.300 0.456 62 0.010
Iberian, Spain 0.410 0.436 0.468 97 -0.026
Indian Telegu, UK 0.375 0.423 0.450 -0.048
Japan 0.399 0.400 0.474 105 -0.001
Vietnam 0.382 0.412 0.459 99.4 -0.030
Luhya, Kenya 0.299 0.296 0.450 74 0.003
Mende, Sierra Leone 0.301 0.277 0.458 64 0.024
Mexican in L.A. 0.374 0.473 0.431 88 -0.100
Peruvian, Lima 0.370 0.477 0.427 85 -0.108
Punjabi, Pakistan 0.376 0.418 0.453 84 -0.042
Puerto Rican 0.375 0.425 0.449 83.5 -0.050
Sri Lankan, UK 0.377 0.405 0.458 79 -0.028
Toscani, Italy 0.409 0.437 0.466 99 -0.028
Yoruba, Nigeria 0.305 0.285 0.458 71 0.020
r x IQ 0.833 0.654 0.413 -0.297

 

It is indeed the case, as the reviewer had predicted, that derived alleles have a higher frequency among Europeans, whether they have a positive effect or not. But the question is: Are derived alleles with a positive effect better predictors of population IQ than derived alleles with a negative effect? If the alleles contain signal that goes above and beyond that produced by being derived, the correlation between derived positive and country IQ should be stronger than that between derived negative and country IQ. In other words, this would tell us that the GWAS found signal above and beyond that provided simply by (ancestral vs derived) allele status.

The correlation between DP (derived alleles with positive effect) and country IQ is r= 0.83. The correlation between country IQ and AP (ancestral alleles with positive effect) is r=-0.65.

This implies that the signal in the total polygenic score (average frequency of all derived and ancestral alleles together) is partly driven by the derived alleles. However, a closer inspection of the matrix will tell us that the correlation between derived alleles with negative effect and IQ is r=0.65, which is lower than that between derived alleles with positive effect and population IQ (r=0.83).

Clearly, more SNPs are required to validate this picture.

Let’s look at the hits found by Rietveld et al. To avoid post-hoc classifications, I employed the same that I used for the analysis in my paper. There were 10 genome-wide significant SNPs (p<5*5*10-8). However, 9/10 alleles with positive effect were derived so there were not enough ancestral positive alleles to make a comparison.The SNPs with a p value between 5*10-7 and 5*10-8) had a sample N= 99. We can see that derived and ancestral alleles are equally represented (DA:AA=49:50).

The same procedure applied to the Rietveld et al. (2014) SNPs to control for differential distribution of derived alleles due to GWAS artifact or bottleneck effects (Henn et al., 2015) will be employed here. Alleles with a positive effect are divided into two sub-groups: those that are derived and those that are ancestral. Reversing their frequencies (1-n) yields the frequencies of derived negative and ancestral negative alleles, respectively. These are shown in table 2.

Table 2. Educational attainment SNPs with a p value between 5*10-7 and 5*10-8 from Rietveld et al. (2013).

Population Derived Positive Derived Negative IQ DP-DN
Afr.Car.Barbados 0.302 0.377 83 -0.075
US Blacks 0.328 0.387 85 -0.059
Bengali Bangladesh 0.339 0.468 81 -0.129
Chinese Dai 0.425 0.384 0.041
Utah Whites 0.471 0.412 99 0.059
Chinese, Bejing 0.446 0.336 105 0.111
Chinese, South 0.451 0.362 105 0.088
Colombian 0.399 0.360 83.5 0.039
Esan, Nigeria 0.298 0.372 71 -0.074
Finland 0.483 0.423 101 0.060
British, GB 0.492 0.372 100 0.120
Gujarati Indian, Tx 0.405 0.417 -0.012
Gambian 0.305 0.387 62 -0.082
Iberian, Spain 0.490 0.352 97 0.138
Indian Telegu, UK 0.376 0.458 -0.082
Japan 0.466 0.355 105 0.111
Vietnam 0.485 0.353 99.4 0.132
Luhya, Kenya 0.321 0.390 74 -0.069
Mende, Sierra Leone 0.309 0.381 64 -0.072
Mexican in L.A. 0.400 0.365 88 0.035
Peruvian, Lima 0.337 0.335 85 0.001
Punjabi, Pakistan 0.405 0.421 84 -0.016
Puerto Rican 0.406 0.363 83.5 0.043
Sri Lankan, UK 0.349 0.454 79 -0.105
Toscani, Italy 0.492 0.357 99 0.136
Yoruba, Nigeria 0.306 0.364 71 -0.058
r x IQ 0.891 -0.255 0.848

First, we can see that the reviewer’s claim that derived alleles have higher frequencies among Europeans is debunked, as this is true only for derived alleles with a positive effect , but not those with a negative effect, which actually reach higher frequencies among South Asians (e.g. Indian Telegu: 0.458) but are otherwise equally distributed across Africans (e.g. Esan Nigeria: 0.372) and Europeans (e.g. British: 0.372). What is their correlation with population IQ? If GWAS hits really had higher frequencies among Europeans than Africans simply because (according to the reviewer) of a methodological artifact, this should apply irrespective of the effect on educational attainment. In other words, positive and negative effect derived alleles should be found at higher frequencies among Europeans. What about the polygenic scores correlations to population IQ? Again, if the polygenic scores’ correlation to population IQ were driven only by derived allele status, alleles with a positive effect on educational attainment should not be more strongly correlated to population IQ than alleles with a negative effect.

We can see that the correlation between derived positive polygenic score and IQ is 0.89, much higher than that between derived negative and IQ (-0.25). This suggests that the alleles pick selection signal that goes above and beyond random drift or effects of GWAS artifact. Another interesting result is that ancestral alleles with a positive effect do not seem to predict population IQ (r=0.25) confirming my prediction that intelligence enhancing alleles should be overrepresented among human-specific mutations. If we assume that human-specific mutations with a positive effect on IQ at the individual level are the least likely to contain false positives, we can consider this as the best measure of selection pressure strength across populations. We can see that this index peaks among Europeans (highest scores for Italians and British= 49%) and East Asians (e.g. Chinese Bejing= 44.6%). South Asians have lower scores (Bangladesh= 33.9%), and even lower in sub-Saharan African populations (around 30%).

Perhaps another measure of selection would be the difference between derived positive and derived negative (dp-dn) allele frequencies. This would take into account the DAF (derived allele frequencies) distributions due to population bottlenecks and drift. We can see that even this measure is substantially correlated to population IQ (r=0.85).With this methodology, it turns out that the (dp-dn) score for the Rietveld et al. (2014) 69 SNPs is weakly but negatively correlated to population IQ (r=-0.297).

Another way to validate a measure is to see how well it replicates across datasets: Are derived allele frequencies from one dataset correlated to derived allele frequencies in the other?What we are interested here is whether derived allele frequencies with a positive effect on intelligence have similar frequencies across datasets. If they do, this suggests that they are picking up more than random noise.

It turns out that the correlation between derived positive allele frequencies in the two datasets (Rietveld et al., 2013 and Rietveld et al., 2014) is positive (r= 0.88). On the other hand, the correlation between the derived negative alleles is near zero (r= 0.08). This suggests that alleles with a positive effect on IQ pick up selection signal, whereas the alleles with a negative effect on IQ represent noise. If these represented mere noise, then also the method of subtracting dn from dp would not be sound. Again, more data are needed to shed light on this issue.

 

A somewhat puzzling finding is the dramatic drop in the percentage of derived alleles with a positive effect when value goes above the conventional GWAS significance threshold (p<5*108). 9/10 of the GWAS significant hits were derived. However, only about 50% of those belonging to the second group (p value between 5*10-7 and 5*10-8) were derived. The dramatic drop is perhaps an artifact of adopting a dichotomous approach, dividing the groups by a conventional threshold. One would have to correlate the p value to the derived vs ancestral allele status. This was done in my paper using the 67 alleles found by Rietveld et al. (2014) to increase cognitive performance, and a slightly positive effect was found. Using the 109 SNPs (top 10 + 99 making up the second group), yields a correlation r= -0.019. Since derived alleles are coded as 1 and ancestral ones as 0, this implies that there is a very weak association between derived status and low p value. However, this is driven entirely by the top 10 SNPs. A limitation of this analysis is that the SNPs are not independent in LD, hence if there are clusters of SNPs around a certain p value, this will bias the derived allele count giving undue weight to alleles in that p value range. Bigger samples of SNPs pruned for LD will be required to replicate the association between derived status and positive effect found in the Rietveld et al. (2014) data set.

 

The reviewer thinks that the derived alleles are not necessarily enriched for intelligence enhancing signal and stated:  “it is not necessarily the case that an association between derived status and a positive effect points toward selection increasing the mean of the trait. Such selection can actually lead to the opposite association (between derived status and a negative effect) at certain allele frequencies.”

I must confess that I do not understand this argument. Surely if a mutation unique to the human lineage (arisen after the most recent common ancestor of all living humans) had been detrimental, making humans less intelligent than primates, this would have been selected against, hence disappearing from the genome? Purifying selection is much more common than positive selection because random mutations are usually deleterious.

Selection increasing the mean of the trait does actually produce an increase in derived alleles when there has been positive directional selection for the trait in a species. We know that this is the case for humans, as cranial capacity and behavioral complexity has dramatically increased in the last 4 million years and modern humans are much more intelligent than non-human primates.  Selection must necessarily have increased the intelligence-enhancing mutations, hence the derived alleles.

The reviewer’s argument would apply to height, as there has not really been increase in stature, at least from Homo Erectus to Homo Sapiens Sapiens. And that’s indeed what we found: height increasing alleles are only marginally enriched for derived alleles (53%), a finding that is likely a fluke.

Another comment worthy of consideration is this: “the extrapolation to non-European populations is still problematic because the accuracy of the polygenic score declines in such populations as a result of differing LD patterns (Scutari et al., 2015). “

Differences in LD should simply reduce the frequency differences at the tag SNPs between populations, compared to the real causal SNPs. This is due to a phenomenon called “attenuation”. Indeed, correction for attenuation is used “to rid a correlation coefficient from te weakening effect of measurement error (Jensen, 1998). This scenario works in the case that the frequency differences between tag and causal SNPs are due to random error, so that the mean frequency of the cognitive ability alleles is equal to the (genome-wide) background frequency (which for a mathematical reasons, is 50%). If instead there is a systematic bias, so that the mean frequency of the causal alleles is lower than the background frequency, then attenuation will reduce observed population-level frequency differences at tag alleles. As the reviewer says,” A GWAS of Europeans is more likely to detect SNPs with high minor allele frequencies”. Hence, the average frequency of at the causal alleles identified by the GWAS tends to be lower than 50%. That this is true, can be seen from the tables displaying the average frequencies of educational attainment increasing alleles, which tend to be much lower than 50%, especially at the lowest p values.

For example, let the average frequency of causal alleles be 40 % in the reference European population.  We also know that the average genome-wide frequency of alleles is 50 % in all populations (the sum of two alleles is always 100). If LD breaks down at some loci so that the tag SNP is uncorrelated to the causal SNP, the tag SNP in the non-European population will have a bias towards higher frequency compared to that of the European population. Hence, differences in LD should cause non-European populations to have higher frequencies at the tag SNPs (that is, the “GWAS hits”) than European populations and to reduce frequency differences among these populations, as all of them tend to be closer to 50%.

So this is the opposite than what the reviewer said:“ Now suppose that in a different population the SNPs are uncorrelated, the reference allele at the causal SNP has a somewhat higher frequency, and the reference allele at the tag SNP has a much lower frequency. Then the inference made from comparing the polygenic scores of the two populations is exactly the opposite of the truth. “

The reviewer’s argument can perhaps apply to a single SNPs but there is no reason why there should be a systematic bias in the direction predicted by that argument and sadly the reviewer just assumes that this is so, without providing any justification.

References:

Jensen, A.R. (1998). The g Factor: The Science of Mental Ability Praeger, Connecticut, USA

Henn et al. (2015). Distance from sub-Saharan Africa predicts mutational load in diverse human genomes. PNAS

 

Appendix:

 

Reviewers’ comments:

 

Reviewer #1:I recommend the rejection of this manuscript without opportunity for revision. It does not meet the very high standards for demonstrations of natural selection acting to differentiate modern human populations that have been set by recent publications (Turchin et al., 2012; Robinson et al., 2015). Here I will only detail a few of the manuscript’s shortcomings.

 

The authors do not address the possibility that the GWAS results of Rietveld et al. (2013) are contaminated by confounding (cognition- or education-affecting environmental variables that happen to be correlated with genetic variation). Although the original publications of the SSGAC deal with this issue to some extent, they do not come up to the standards set in the papers that I have cited in the previous paragraph. Furthermore, one research group with control of an extremely large family cohort is currently working on a manuscript documenting that years of education is subject to a very peculiar form of confounding. Until these results are published and well absorbed, any naive inferences regarding the basis of racial differences should be regarded with skepticism.

 

The authors also do not address the issue of ascertainment bias. A GWAS of Europeans is more likely to detect SNPs with high minor allele frequencies. The minor allele is usually the derived allele, and thus the use of SNPs ascertained to have low p-values in a GWAS of Europeans will lead to an overrepresentation of SNPs with high derived allele frequencies specifically in Europeans. If the derived allele tends to have a positive effect (as the authors claim), this is certainly an issue that needs to be carefully addressed.

 

True, it may be that ascertainment bias is less of an issue when all SNPs regardless of p-value are used to construct a polygenic score. But the extrapolation to non-European populations is still problematic because the accuracy of the polygenic score declines in such populations as a result of differing LD patterns (Scutari et al., 2015). An example will make this clear. Suppose that two SNPs in perfect LD in Europeans have quantitatively close positive reference betas. Now suppose that in a different population the SNPs are uncorrelated, the reference allele at the causal SNP has a somewhat higher frequency, and the reference allele at the tag SNP has a much lower frequency. Then the inference made from comparing the polygenic scores of the two populations is exactly the opposite of the truth. We can conclude from this that the use of polygenic scores to infer the causes of intercontinental differences requires much more care than given to it here.

 

Because stabilizing selection (favoring the “golden mean,” as the authors put it) also eliminates genetic variation, higher dispersion of allele frequencies across populations is by itself not diagnostic of directional selection.

 

The fact that a large fraction of the enhancing alleles reported by Rietveld et al. (2014) SNPs are derived does not mean very much. First, as it is likely that many of the SNPs are not causal, the relationship between derived alleles at different polymorphic sites must be addressed. Second, even if it be assumed that these are the causal SNPs, it is not necessarily the case that an association between derived status and a positive effect points toward selection increasing the mean of the trait. Such selection can actually lead to the opposite association (between derived status and a negative effect) at certain allele frequencies.

 

A general comment is that the appropriateness of much of the hypothesis testing in this paper is difficult to judge. The stochastic model justifying a particular statistical test is usually unclear. Is the source of randomness inaccuracy in the GWAS estimates? The inherent stochasticity of evolution?

 

Robinson, M. R., Hemani, G., Medina-Gomez, C., et al. (2015). Population differentiation of height and body mass index across Europe. Nature Genetics, 47, 1357-1362.

 

Scutari, M., Mackay, I., & Balding, D. J. Using genetic distance to infer the accuracy of genomic prediction. arXiv:1509.00415.

 

Turchin, M. C., Chiang, C. W. K., Palmer, C. D., Sankararaman, S., Reich, D., GIANT Consortiu, & Hirschhorn, J. N. (2012). Evidence of widespread selection on standing variation in Europe at height-associated SNPs. Nature Genetics, 44, 1015-1019.

 

Reviewer #2: I found this paper extremely reader-unfirendly.  Starting from the title – which is cumbersome, to the tables – which lack meaningful explanations and notes, to references to specialist concepts – that require much further explanaion for non-expert reader, to the general structure of the write up – the paper needs extensive revisions before it can be considered for publication for Intelligence.

The paper is full of poorly justified conlusions.  For example, in the abstract, the author claims: ‘Cognitive-enhancing SNPs were significantly enriched for derived alleles

(64%), that is human-specific mutations that originated after the split from the most recent common ancestor between humans and other primates.’  However, the Derived vs ancestral alleles section on page 9 does not present the releavant analyses in details, and therefore the conclusion is not justified.

The paper is full of sentences that would require further clarifications for non-expert audience.  For exampole: ‘Differences in allele frequencies between populations can be created by directional selection when the strength and/or direction of selection on the phenotype differs among populations. In this case it is also characterized as diversifying selection, in contrast to stabilizing selection which tends to favor the “golden mean.”

OR

‘Diversifying selection is most commonly measured using the Fst index at or around single loci (Holsinger & Weir, 2009).’  This needs to be expained further.

OR

‘Some SNPs had opposite betas on the two outcome variables (yes/no college completion and total years of education).’ This requires further discussion.

 

I could go on giving examples of unclear sentences, but I believe that the paper needs to be worked on- the author should consult with non-expert (in this specific area) intelligence researchers – to arrive at a clearer, more streamlined and better explained manuscript.  All analyses require futher explanations, perhaps, with specific examples, that would talk the reader through every step of the analyses.

 

 

 

 

Advertisement

Agreement between Q-Q plot and Shapiro-Wilk test of normality

Davide Piffer – 03/08/2015

Q-Q plots are commonly used to detect deviations from the normal distribution. This can be done visually or – more formally – calculating the correlation between the theoretical and the empirical distributions.

Another widely used test of normality is the Shapiro-Wilk test. This produces a coefficient W with a value of 1 corresponding to perfect normality (no deviation from the theoretical distribution) and lower values representing deviations from normality.

My goal was to determine the degree of agreement between the estimates produced by these two methods. In order to achieve this, I computed the correlation between the theoretical (x axis) and the empirical (y axis) for the Q-Q plots and carried out the Shapiro-Wilk test on several continuous variables. Then, I correlated the W value to the Q-Q plot correlation coefficient.

Methods

Variables were taken from two files (NineHitsBetaFst_B.csv and Factors.csv) in the data set I used for the population genetics study of intelligence. The vectors represent allele frequencies or factors derived from allele frequencies via factor analysis (Piffer, 2015).

Data files containing the vectors can be downloaded from: https://osf.io/jt73x/

Results of the analysis are reported in this spreadsheet: https://docs.google.com/spreadsheets/d/1fg2evimqFlx2PqxopcfiJy99i4d6NgZw2HnsBeIzgUc/edit?usp=sharing

R was used to carry out the analysis.

R Code is in the appendix.

Results

The correlation between Q-Q xy and Shapiro-Wilk W was r=0.993 (N=19; p<0.001).

Figure 1. Relationship between Q-Q plot xy correlation and Shapiro-Wilk W.

qqWplot

The relationship between the two variables can be approximately described by this formula:

1-W =~ 2(1-Corr Q-Q plot),

e.g. 9SNPsGIDist: Q-Q corr= 0.952 and Shapiro-W= 0.905. This can be seen from table 1.

Table 1. Relationship between the two methods (1-x).

1-Corr Q-Q 1-W (1-W)/(1-Corr Q-Q)
0.0322736 0.0661 2.048113628
0.0355605 0.07253 2.039622615
0.0231317 0.04782 2.067292936
0.0230315 0.04779 2.074984261
0.0471659 0.09458 2.005262276
0.0264781 0.05437 2.05339507
0.0881257 0.16912 1.919076955
0.0270037 0.05495 2.034906328
0.0243319 0.04994 2.052449665
0.0221553 0.04577 2.065871372
0.0267268 0.05474 2.048131464
0.0651654 0.12811 1.965920565
0.0228832 0.04754 2.077506642
0.0308334 0.06289 2.039671266
0.0267176 0.05426 2.030871036
0.0276686 0.05681 2.053230015
0.0328761 0.07879 2.396573803
0.0728126 0.15363 2.109937016
0.0384661 0.08824 2.293967935

There is indeed a slight tendency for the ratio to fall as departures from normality get bigger (i.e. with strong departures from 1, W is slightly less than twice as big as 1-corr Q-Q, whereas it is slightly more than twice as big when departures from normality are small).

Discussion

There is a very strong agreement between two commonly used methods to test for normality of data. An advantage of the Shapiro-Wilk test is that it provides a test of the null hypothesis that the population is normally distributed. However, p values have many issues, besides being affected by sample size such that a very large sample size will always result in rejection of the null hypothesis even in the the presence of tiny deviations from normality (Kirkegaard, 2014).

References:

Kirkegaard, E. (2014).W values from the Shapiro-Wilk test visualized with different datasets. http://emilkirkegaard.dk/en/?p=4452

Piffer, D. (2015). A review of intelligence GWAS hits: their relationship to country IQ and the issue of spatial autocorrelation. Figshare, http://dx.doi.org/10.6084/m9.figshare.1393160

Appendix

#Dataset NineHitsBetaFst_B

newdata3=na.omit(NineHitsBetaFst_B)

qqChr21Fst=qqnorm(newdata3$Chr21.Fst)#creates Q-Q plot and assigns it a name

cor(qqChr21Fst$x,qqChr21Fst$y)#computes correlation between x and y axes of Q-Q plot

shapiro.test(newdata3$Chr21.Fst) # Shapiro-Wilk test

qqChr1Fst=qqnorm(newdata3$Chr1.Fst)

cor(qqChr1Fst$x,qqChr1Fst$y)

shapiro.test(newdata3$Chr1.Fst)

qqIQdist=qqnorm(newdata3$IQ.distances)

cor(qqIQdist$x,qqIQdist$y)

shapiro.test(newdata3$IQ.distances)

qqX4.SNP=qqnorm(newdata3$X4.SNPs.GI.distances)

cor(qqX4.SNP$x,qqX4.SNP$y)

shapiro.test(newdata3$X4.SNPs.GI.distances)

qqX9.SNP=qqnorm(newdata3$X9.SNPs.GI.distances)

cor(qqX9.SNP$x,qqX9.SNP$y)

shapiro.test(newdata3$X9.SNPs.GI.distances)

qqset1=qqnorm(newdata3$Set1)

cor(qqset1$x,qqset1$y)

shapiro.test(newdata3$Set1)

qqset2=qqnorm(newdata3$Set2)

cor(qqset2$x,qqset2$y)

shapiro.test(newdata3$Set2)

qqset3=qqnorm(newdata3$Set3)

cor(qqset3$x,qqset3$y)

shapiro.test(newdata3$Set3)

qqset4=qqnorm(newdata3$Set4)

cor(qqset4$x,qqset4$y)

shapiro.test(newdata3$Set4)

qqset5=qqnorm(newdata3$Set5)

cor(qqset5$x,qqset5$y)

shapiro.test(newdata3$Set5)

qqset6=qqnorm(newdata3$Set6)

cor(qqset6$x,qqset6$y)

shapiro.test(newdata3$Set6)

qqset7=qqnorm(newdata3$Set7)

cor(qqset7$x,qqset7$y)

shapiro.test(newdata3$Set7)

qqset8=qqnorm(newdata3$Set8)

cor(qqset8$x,qqset8$y)

shapiro.test(newdata3$Set8)

qqset9=qqnorm(newdata3$Set9)

cor(qqset9$x,qqset9$y)

shapiro.test(newdata3$Set9)

qqset10=qqnorm(newdata3$Set10)

cor(qqset10$x,qqset10$y)

shapiro.test(newdata3$Set10)

qqsetpolscore=qqnorm(newdata3$Polygenic.Score)

cor(qqsetpolscore$x,qqsetpolscore$y)

shapiro.test(newdata3$Polygenic.Score)

#Dataset Factors

newdata4=na.omit(Factors)

qqsetX4=qqnorm(newdata4$X4SNPs.g.factor)

cor(qqsetX4$x,qqsetX4$y)

shapiro.test(newdata4$X4SNPs.g.factor)

qqsetX9=qqnorm(newdata4$X9.SNPs.factor)

cor(qqsetX9$x,qqsetX9$y)

shapiro.test(newdata4$X9.SNPs.factor)

qqsetgpol=qqnorm(newdata4$G.Polygenic.Score)

cor(qqsetgpol$x,qqsetgpol$y)

shapiro.test(newdata4$G.Polygenic.Score)

#Scatterplot (Q-Q cor vs Shapiro-Wilk W)

library(car)

newdatascatterplot=na.omit(qqplots..BetaFst)#load .csv file with results (download from Google Docs link)

scatterplot(newdatascatterplot$SHAPIRO.WILKS.W~newdatascatterplot$CORR.Q.Q.PLOT,main=”Q-Q Plot xy cor vs Shapiro-Wilk W (r=0.99)”, xlab=”Shapiro-Wilk W”,ylab=”Q-Q Plot xy cor”,smoother=FALSE) #creates regression scatterplot with Q-Q plot correlation and Shapiro-Wilk W

cor(newdatascatterplot$SHAPIRO.WILKS.W,newdatascatterplot$CORR.Q.Q.PLOT) #computes correlation between the two methods

Ice cream, anyone?

I submitted my paper to a journal which is famous for publishing second class papers, thinking that my paper, which is obviously superior to the average paper published on Evolutionaty Psychology (the little journal of just so stories for kids), would get a fair hearing. However, my hopes were shattered when a board of experts decided that it was not suitable for the journal. Of course, in the best of totalitarian tradition, reasons were not provided for the rejection. This shows how a few gatekeepers decide to stop ideas from becoming public if they do not conform to their tastes. You, the know it all who rejected my paper, do you know that science is not like buying an ice cream? You cannot simply say “I want chocolate, I do not want strawberry”. You need to justify your decisions. Othwerwise, how can you expect authors to justify their hypotheses with sound empirical evidence? But of course you do not know what science really is. Icecream, Bern Hard Funk?

Dear …..,

Thank you for your submission to Evolutionary Psychology. We have given your submission full attention. However, after consultation with the Editorial board, we have decided that your manuscript is not suitable for publication in Evolutionary Psychology, and thus won’t be sent out for in-depth review. I am sorry for being the bearer of what must be negative news. The Editors of Evolutionary Psychology aim to give quick feedback particularly with submissions, which are unlikely to get accepted even after in depth review and/or revision. Alas your submission falls into this category and was therefore rejected at this stage.

Best of luck with your work.

Sincerely,

….

Fitness or longevity? Life as a photocopyier or a time machine?

The standard view of evolutionary biology is that life’s aim is to maximise the number of genes (in reality, not genes but “alleles” is the correct term) that are passed on from one generation to the next. In reality, this disregards the imporance of time. In this model, popularized by R.Dawkins, life is a photocopier which focuses on making as many copies of an allele as possible.

In fact, a more accurate formulation of fitness would be this: the total amount of time that an allele is present in the DNA of a living organism. Thus, number of alleles x time or f= N x t.

If an organism generates 2 long- lived offspring that die when they’re 100 years old, its fitness would be:  200, the same level of fitness of an organism that has 200 offspring each dying after 1 year or 2400 offspring with a lifespan of 1 month.

This accounts for organisms’ tendency to extend their life well past their reproductive age – a phenomenon that classical evolutionary biology cannot explain without recourse to just-so stories (e.g. the fitness benefits of the elderly to subsequent generations) – or for the tendency of evolution to produce long-lived species which generate few offspring (K-selection).

Who knows?

To illustrate the concept I outlined in my previous post, I found this beautiful text written by philosopher-logician Raymond Smullyan, who, if not a skeptic, can hardly be called a spiritualist (Italics are mine). R. Smullyan (2003). Who Knows?, pp. 26-27.

 “The fact that there is no reliable evidence that the living have ever communicated with departed spirits constitutes strong probabilistic evidence that the living have never yet established such communication and, most likely, never will. But is it scientifically legitimate to conclude that there are no departed souls?

The point is that there is such a thing as well-designed and poorly designed experiments- in brief, good and bad experiments. Well, the experiments of mediums strike me as incredibly bad!

Why on earth should one expect that because a medium goes into a trance, a departed spirit should take over his or her body? For that matter, suppose I light a fire in my fireplace, hoping that in the middle of the night, after the fire goes out, a departed spiri twill write a message in the ashes. The next morning, there is no message. Suppose this experiment is repeated millions of times and always with negative results. What conclusion should be drawn? That there are probably no departed spirits? No. The right conclusion is that if there are any spirits, they don’t write messages in ashes.

My whole point, of course, is that spiritualism and survival are very different things, and that the negative results of spiritualistic investigation do confirm that spiritualism is probably false, but casts no light on the probability of survival.

I wish to urge that the belief in the after life is neither unscientific nor scientific, but completely tangential to science.

Not showing that something is not there is not showing that something is not there.

From the sentence above, it could look like the first and the second part of the sentence were equivalent. The same confusion seems to infect the minds of most people, with scientists and lay people alike being its frequent victims.

In reality, the sentence above can be more clearly stated as: To show that something exists is different than not showing that something exists.

The argument from ignorance states that a proposition is true because it has not been proven false or that it is false because it has not been proven true. This argument is used by skeptics and believers alike. For example, atheists conclude that God does not exist because its existence has not been proven. On the other hand, believers think that God exists because its non-existence has not been proven, that is to say the statement that God exists has not been proven false. This is a fallacy of informal logic because it excludes a third option, which we’ll see later on.

At least believers can see that they feel or perceive God, and this entity is real for them. On the other hand, atheists only have a lack of evidence because they cannot perceive God. So the burden of proof is on atheists to show that God does not exist.

Both most atheists and believers commit the sin of overestimating human intellect’s capacity, because a failure to see or understand something is taken a level up to imply that something does not exist or is false, completely forgetting that if we do not see something it could be that we’re simply blind.

Aside from the war raging between atheists and believers, this attitude has infected many areas of science but I will deal with those with which I am most familiar. Randomness is the entity advocated by skeptics to deny the existence of other processes, which possess the properties of predictability or purpose.

The argument from ignorance’s fallacy affects biology in a double fashion. The most “ignorant” theory in evolutionary genetics is the neutral theory of molecular evolution, according to which evolution is caused by random drift of alleles that are neutral, thus denying or minimizing the importance of natural selection. Very few geneticists believe that random drift entirely accounts for the evolution of species over time and instead they admit the importance of natural selection.

However, the current consensus is that the genetic mutations that are the material for selection are random. This is based on the absence of compelling evidence to the contrary. To be sure, there is some evidence that genetic mutations are not random, but this is dismissed as preliminary or not strong enough to be taken seriously.

Thus, a lack of strong evidence is taken to imply evidence that intelligent processes do not operate in the arising of genetic mutations.

Not even a complete lack of evidence, but a failure to find “convincing” evidence for the theory that genetic mutations are non-random is taken as evidence for their randonmess. The researcher who tries to find evidence for non-randomness,when failing to find it will conclude that randomness is the real process. Instead, wouldn’t it be wiser to doubt one’s ability to detect deviations from randomness? Especially since life forms are so amazingly complex, beautiful and intelligent. It’s funny indeed how scientists require extraordinary proof for statements that are discordant with (their peculiar form of) common sense, such as the existence of psychic phenomena, but they’re happy to agree with a theory that common sense regards as crazy, that is the creation of complex life forms from chaos, and for which there is no positive evidence but only negative evidence (that is, only absence of evidence that genetic mutations are non random).

When scientists reject the evidence for psychic phenomena, they realize that the argument from ignorance is not sufficient because the evidence for psychic phenoma is indeed quite impressive (25 years of research at Princeton Engineering Anomalies Research Lab and Stanford Institute, plus ganzfeld experiments meta-analazyed with odds against chance that would make any contrarian punter pale and give up). So they have made up a rule that “extraordinary claims require extraordinary evidence”, without specifying what makes a claim extraordinary. This rule is an appeal to common sense, tracing back its origins to Davide Hume, the father of common sense philosophy.

But the common sense of lay people is different than that of scientists. Lay people regard darwinian evolution as crazy and incredible, whereas are more ready to accept the existence of psychic phenomena. So which common sense does science appeal to? The real common sense is that of the lay people, because it’s more “common” than the common sense of scientists, who are a tiny minority of the population.

Having taken Darwin on a stroll with us, we now meet Kendall, the statistician of tau’s fame, whom we’ll happily follow in a random walk to the toilet and let him randomly go down it.

This guy has the merit of having proven to the world that he was a loser as a trader but he turned his failure into an academic win by pretending he had demonstrated that Wall Street (short for all financial markets) is random, or that given a price taken at any moment it is impossible to predict a price at a future moment. This kind of failure is all too common in the traders’ community, with the only difference that traders who lose money usually blame their losses on themselves or at best on bad luck. Kendall instead devised a clever way out of his failure, assuming that because he could find no patterns, there were no patterns, thus the markets must be random. To traders who consistently outperform the market this is pure nonsense, and also to any person with a bit of common sense, realizing that if milions of transactions take place every day in financial markets, they must have a purpose or rational basis, which is the reality that beating the markets is possible, albeit extremely difficult because the market is not (completely) random.