S: 22 Oct 2024 (b.7.7) D: 22 Oct 2024 Main functions (--distance...) (--make-grm-bin...) (--ibs-test...) (--assoc, --model) (--mh, --mh2, --homog) (--assoc, --gxe) (--linear, --logistic) Core algorithms Quick index search |
Epistasis testsFast scan, case/control phenotype--fast-epistasis [{boost | joint-effects | no-ueki}] ['case-only'] [{set-by-set | set-by-all}] ['nop'] --gap <min kb gap for case-only test> --fast-epistasis starts an imprecise but fast scan for epistasis based on inspection of 3x3 joint genotype count tables. For large datasets, it is reasonable to start with this command (using liberal p-value thresholds) to identify candidate pairs for further investigation, and then follow up with a more rigorous and computationally expensive analysis on those pairs, such as the --epistasis logistic regression below. Results are usually written to plink.epi.cc and .epi.cc.summary. By default, the original allele-based test (see the PLINK 1.07 documentation for details) is applied to these tables. Two newer tests are now supported: 'boost' invokes an extended version (missing data is now permitted, and df is properly adjusted when e.g. a variant lacks homozygous minor genotype observations) of the likelihood ratio test introduced by Wan X et al. (2010) BOOST: A Fast Approach to Detecting Gene-Gene Interactions in Genome-wide Case-Control Studies, while 'joint-effects' applies the joint effects test introduced in Ueki M, Cordell HJ (2012) Improved statistics for genome-wide interaction analysis. Results for the original test normally differ slightly from PLINK 1.07 since we apply the variance and empty cell corrections suggested in Ueki and Cordell's paper. To disable these corrections for testing purposes, use the 'no-ueki' modifier. To perform a case-only test instead of a case/control test, add the 'case-only' modifier. Since this test assumes the two variants are in linkage equilibrium in the general population, pairs closer than 1000 kb are normally skipped; this setting can be adjusted with --gap. All pairs of polymorphic variants on autosomal diploid chromosomes are normally tested. To just test pairs of variants within a single set, add the 'set-by-set' modifier and load exactly one set with --set/--make-set; with exactly two sets loaded, all variants in one set are tested against all variants in the other. 'set-by-all' tests all variants in one set against the entire genome instead. --epi1 adjusts the (screening, for the 'boost' test) p-value for inclusion of pairs in the main report; if not specified, it defaults to 0.0001 (5e-6 for 'boost'). (With small datasets, "--epi1 1" makes sense; but it may fill up your hard drive for little reason when used on large ones.) Usually both raw chi-square statistics and p-values are reported; 'nop' removes the p-values. --epi2 adjusts the p-value threshold (default 0.01) for qualification as a "significant epistatic test result" counted in the .cc.summary report's third column. For the 'boost' test, --epi2 applies to the screening p-value unless its parameter is no larger than the --epi1 parameter. The joint-effects test normally skips marker pairs with fewer than 5 observations in any 3x3x2 contingency table cell (cases and controls are considered separately); you can adjust this threshold with --je-cellmin. Linear/logistic regression-based test--epistasis [{set-by-set | set-by-all}] Given a quantitative trait, --epistasis uses linear regression to fit the model Y = β0 + β1gA + β2gB + β3gAgB for each inspected variant pair (A, B), where gA and gB are allele counts; then the β3 coefficients are tested for significance, and results are written to plink.epi.qt and .epi.qt.summary. Similarly, given a case/control phenotype, --epistasis uses logistic regression to fit ln (P(Y = case)/P(Y = control)) = β0 + β1gA + β2gB + β3gAgB and writes results to plink.epi.cc and plink.epi.cc.summary. --epi1, --epi2, and the 'set-by-set'/'set-by-all' modifiers behave as they do with --fast-epistasis. The linear regression's multicollinearity check can be tuned with --vif. Distributed computation--epistasis-summary-merge <common file prefix> <count> --fast-epistasis and --epistasis jobs can be subdivided with the --parallel flag; however, the variant-based summary files require a specialized merge at the end. --epistasis-summary-merge takes care of this; its first parameter is the common filename prefix up to but not including '.summary.', while the second parameter is the number of files to merge. For example, if you split plink --bfile main_data --fast-epistasis boost --parallel 1 3 --out epi_part plink --bfile main_data --fast-epistasis boost --parallel 2 3 --out epi_part plink --bfile main_data --fast-epistasis boost --parallel 3 3 --out epi_part across three machines, and then gather the output files (epi_part.epi.cc.{1,2,3}, epi_part.epi.cc.summary.{1,2,3}) in one place, you'd merge the main reports with cat epi_part.epi.cc.1 epi_part.epi.cc.2 epi_part.epi.cc.3 > epi_final.epi.cc as usual, and handle the summaries with plink --epistasis-summary-merge epi_part.epi.cc 3 --out epi_final If these functions are still insufficient for your epistasis scanning needs, and you are sure you want more brute force rather than a different kind of analysis, we recommend trying the GPU-based GBOOST tool. Single interaction--twolocus <variant ID> <variant ID> --twolocus writes tables of joint genotype counts and frequencies between the two specified variants to plink.twolocus. With a case/control phenotype, counts and frequencies are also reported for just cases and just controls. |