LD and its impact on cross-population correlations of allele frequencies

Linkage disequilibrium is the correlation between allele frequencies within a population and is quantified by the coefficient of linkage disequilibrium:

D_{AB}=p_{AB}-p_{A}p_{B}.

where A and B are two alleles at two different loci.

However, there is another kind of correlation between alleles, and that is the correlation of allele frequencies between populations.

The cross-population correlation between two unliked alleles will be r= 0. However, linkage disequilibrium will increase the cross-population correlation. Two alleles that are perfectly linked should have a cross-population correlation of 1, that is equal to their within population LD. However, there is a phenomenon known as “linkage breakdown”. As far as I know, there are no publications trying to quantify linkage breakdown in human populations.

Linkage breakdown reflect the extent to which the correlation between true and predicted values decays approximately linearly with respect to genetic related between the training and the target populations, due to different linkage disequilibrium patterns (Marigorta & Navarro, 2013). That is, if an association between gene X and phenotype Y is found in a population (training population), its replicability in other populations will depend on their genetic distance from the training population. This is because SNPs that are found by GWAS are usually not directly causal variants but instead are “tag” (proxy) SNPs, in LD with the real causal variants. If LD breaks down, this will affect also the frequencies distributions. Hence, tag SNPs will not necessarily have the same allele frequencies as the causal SNPs in all populations.

In order to estimate the level of LD breakdown in a way that also would affect the validity of my method based on factor analysis of allele frequencies, I computed the correlation between frequencies of SNPs in LD. Moreover, this was compared to the frequencies of random SNPs (with LD<0.5).

LD was calculated using the R package “rsnps”, with the CEU panel.

The frequencies of SNPs in LD (N=93) with a GWAS hit (rs301800) by Okbay et al. (2016) were downloaded from 1000 Genomes. The correlation between each SNP’s minor allele and and rs301800 was computed. The average correlation was r=0.815.

Conversely, the average correlation between an SNP from the set of random SNPs and all the other SNPs was as expected not significantly different from zero (0.053).

This simulation is not exhaustive nor conclusive but it shows that LD decay is unlikely to be a big problem because LD decay isn’t strong across 26 populations. Further analysis limited to populations from some continents would show if LD breaks down in some continents more than in others. For example, do SNPs in LD among Europeans show more linkage breakdown among East Asians or Africans? One could look at the correlation between allele frequencies in East Asian and African sub-populations separately. If the correlation is stronger among East Asians, this would suggest that LD patterns among Africans are more different.

 

 

References:

Marigorta, U.M., Navarro, A. (2013). High Trans-ethnic Replicability of GWAS Results Implies Common Causal Variants. PLOS Genetics 9, http://dx.doi.org/10.1371/journal.pgen.1003566

 

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s