GWAS笔记SNP过滤

更新时间:2023-12-25 16:29:01 阅读量: 教育文库 文档下载

说明:文章内容仅供预览,部分内容可能不全。下载后的文档,内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的,是否完整无缺。

GWAS学习笔记SNP过滤

1:缺失比例(Missing rates):( GENO> 0.05 )

Shortly we will apply more stringent criteria, such that GENO > 0.05. In this case, 0.05*89 = 4.45 samples, meaning that if a SNP is missing in 4.45 more more samples, that SNP will be removed from the dataset.

不久将来,我们将采用更严格的标准,比如GENO> 0.05。在这种情况下,0.05 * 89 = 4.45样本,这意味着如果SNP在4.45多个样本中丢失,则SNP将从数据集中删除。

2:最小等位基因频率(Minor Allele frequencies)( MAF< 0.03 如果SNP较多可以设置为MAF<0.05)

MAF is the Minor Allele Frequency. It can be used to exclude SNPs which are not informative because they show little variation in the sample set being analyzed. For instance, if a SNP shows variation in only 1 of the 89 individuals, it is not useful statistically and should be removed.

MAF是次要等级线频率。它可以用于排除不信息的SNP,因为它们在被分析的样本集中几乎没有变化。例如,如果SNP仅显示89个个体中的1个,则在统计学上不是有用的,应该被去除。

3:Removing SNPs out of Hardy-Weinberg equilibrium(p-value > 10?6 - 10?4 )

Population genetic theory suggests that under ‘normal’ conditions, there is a predictable relationship between allele frequencies and genotype frequencies. In cases where the genotype distribution is different from what one would expect based on the allele frequencies, one potential explanation for this is genotyping error. Natural selection is another explanation. For this reason, we typically check for deviation from Hardy-Weinberg equilibrium in the controls for a case- control study. For a quantitative trait, PLINK just uses everyone. The following command generates p-values for deviation from HWE for each SNP. Low p-values indicate that a SNP is out of HWE.

人口遗传理论认为,在“正常”条件下,等位基因频率与基因型频率之间存在可预测的关系。在基因型分布与基于等位基因频率的预期不同的情况下,一个潜在的解释是基因分型错误。自然选择是另一个解释。因此,我们通常检查病例对照研究对照组中Hardy-Weinberg平衡的偏差。对于数量特质,PLINK只是使用每个人。以下命令生成每个SNP偏离HWE的p值。低p值表示SNP超出了HWE。

4:如果你有vcf文件,可以先用vcftools转换为plink的输入形势,输出结果为:.bed与.map文件,然后以此作为输入进行过滤: vcftools --vcf my.vcf --plink --out plink

plink --noweb --file plink --geno 0.05 --maf 0.05 --hwe 0.0001 --make-bed --out QC 参考文献:

Roshyara N R, Kirsten H, Horn K, et al. Impact of pre-imputation SNP-filtering on genotype imputation results[J]. BMC genetics, 2014, 15(1): 1.

Pongpanich M, Sullivan P F, Tzeng J Y. A quality control algorithm for filtering SNPs in genome-wide association studies[J]. Bioinformatics, 2010, 26(14): 1731-1737.

本文来源:https://www.bwwdw.com/article/21cx.html

Top