S: 22 Oct 2024 (b.7.7) D: 22 Oct 2024 Main functions (--distance...) (--make-grm-bin...) (--ibs-test...) (--assoc, --model) (--mh, --mh2, --homog) (--assoc, --gxe) (--linear, --logistic) Core algorithms Quick index search |
Distance matricesIdentity-by-state/Hamming--distance [{square | square0 | triangle}] [{gz | bin | bin4}] ['ibs'] ['1-ibs'] ['allele-ct'] ['flat-missing'] --distance is the primary interface to PLINK 1.9's IBS and Hamming distance calculation engine. Output formats
Units Distance weights
Missingness correction Distributed computation Backwards compatibility--distance-matrix These deprecated flags generate space-delimited text matrices, and are included for backwards compatibility with scripts relying on the corresponding PLINK 1.07 flags. New scripts should migrate to "--distance 1-ibs flat-missing" and "--distance ibs flat-missing". Note that you are no longer required to use these flags in conjunction with --cluster. Reloading--read-dists <distance file> [ID file] If you've previously generated a distance matrix using "--distance triangle bin", this lets you reload it for --cluster, --neighbour, and the distance-phenotype analyses below. When no ID file is named, it is assumed that the distance matrix was generated with the same samples in the same order as in the current PLINK run. We are likely to extend this flag to support more --distance output formats in the future. Relationship/covariance--make-rel [{square | square0 | triangle}] [{gz | bin | bin4}] [{cov | ibc2 | ibc3}] --make-rel is the primary interface to PLINK 1.9's genetic relationship matrix and covariance matrix calculator. (See Yang J, Lee SH, Goddard ME, Visscher PM (2011) GCTA: A Tool for Genome-wide Complex Trait Analysis for discussion of relationship matrix definition and usage. Note that this calculation is not LD-sensitive; if that's a problem, we currently recommend using Doug Speed et al.'s LDAK software instead.) Output formats Variance-standardization Inbreeding estimates on diagonal Distributed computation Exporting to GCTA--make-grm-gz ['no-gz'] [{cov | ibc2 | ibc3}] --make-grm-gz and --make-grm-bin perform the same calculation as --make-rel (so the 'cov', 'ibc2', and 'ibc3' modifiers have the same effect), but produce a .grm.gz or .grm.bin-format file for GCTA to process. (--make-grm-gz's 'no-gz' modifier turns off gzipping of the main output file.) The --make-grm-bin computation was switched from single-precision to double-precision internal arithmetic in Nov 2014; see e.g. this real-world instance of insufficient precision leading to flawed science for motivation. (We don't actually expect any of GCTA's results to be dangerously inaccurate, especially when less than ~10 million markers are involved, but we figure a 1.2x-2x speed penalty here is an acceptable price to pay for peace of mind.) These computations can be subdivided with --parallel. Relationship-based pruning--rel-cutoff [maximum] If used in conjunction with a later calculation (see the order of operations page for details), --rel-cutoff excludes one member of each pair of samples with observed genomic relatedness greater than the given cutoff value (default 0.025) from the analysis. Alternatively, you can invoke this on its own to write a pruned list of sample IDs to plink.rel.id. PLINK tries to maximize the final sample size, but this maximum independent set problem is NP-hard, so we use a greedy algorithm which does not guarantee an optimal result. In practice, PLINK --rel-cutoff does yield a maximum set whenever there aren't too many intertwined close relations, and it outperforms GCTA --grm-cutoff when there are (we chose our greedy algorithm carefully); but if you want to try to beat both programs, use the --make-rel and --keep/--remove flags and patch your preferred approximation algorithm in between. (We may add one or two levels of backtracking to our --rel-cutoff if its level of imperfection becomes problematic.) Note that, while it is possible to use --rel-cutoff on a previously calculated relationship matrix by combining it with --grm-gz/--grm-bin (like how GCTA --grm-cutoff is used), we do not expect that to be the typical workflow. Distributed computation--make-rel and --make-grm-gz/--make-grm-bin jobs can be subdivided with the --parallel flag. However, --rel-cutoff cannot run concurrently with parallel relationship matrix evaluation; instead, it must act on the final assembled matrix. This is the primary use case for --grm-gz/--grm-bin. Distance-phenotype analysisCase/control--ibs-test [permutation count] --groupdist [iteration count] [d] --ibs-test and --groupdist consider three subsets of the distance matrix: pairs of affected samples, affected-unaffected pairs, and pairs of unaffected samples. Each of these subsets has a distribution of pairwise genomic distances; --ibs-test uses permutation to estimate p-values re: which types of pairs are most similar (see here for details), while --groupdist focuses on the differences between the centers of these distributions and estimates standard errors via delete-d jackknife. To perform this type of analysis with scalar phenotype data, you may combine --ibs-test/--groupdist with the --tail-pheno flag. However, the distance-phenotype regression described next should be more informative. If --ibs-test is run with no parameters, 100000 permutations are used. If --groupdist is run with less than two parameters, d is set to <number of people>0.6 rounded down; with no parameters, 100000 jackknife iterations are run. When combining these commands with --read-dists, units must match: "--distance triangle bin ibs" goes with --ibs-test, while "--distance triangle bin" goes with --groupdist. Distance-QT regression--regress-distance [iteration count] [d] These flags perform simple linear regressions and evaluate delete-d jackknife standard error estimates. --regress-distance regresses genomic distances on pairwise average phenotypes and vice versa, while --regress-rel regresses genomic relationships on pairwise average phenotypes and vice versa. With less than two parameters, d is set to <number of people>0.6 rounded down. With no parameters, 100000 jackknife iterations are run. A previously calculated triangular binary distance matrix can be loaded as input to --regress-distance using --read-dists. There is currently no similar shortcut for --regress-rel. |