Identity-by-descent
These calculations are not LD-aware. It is usually a good idea to perform some form of LD-based pruning before invoking them.
--genome ['gz'] ['rel-check'] ['full'] ['unbounded'] ['nudge']
--ppc-gap <distance in kbs>
--min <minimum PI_HAT value>
--max <maximum PI_HAT value>
--genome invokes an IBS/IBD computation over autosomal SNPs (so chrX, chrY, and chrM are excluded), and then writes a report with the following fields to plink.genome:
FID1 | Family ID for first sample |
IID1 | Individual ID for first sample |
FID2 | Family ID for second sample |
IID2 | Individual ID for second sample |
RT | Relationship type inferred from .fam/.ped file |
EZ | IBD sharing expected value, based on just .fam/.ped relationship |
Z0 | P(IBD=0) |
Z1 | P(IBD=1) |
Z2 | P(IBD=2) |
PI_HAT | Proportion IBD, i.e. P(IBD=2) + 0.5*P(IBD=1) |
PHE | Pairwise phenotypic code (1, 0, -1 = AA, AU, and UU pairs, respectively) |
DST | IBS distance, i.e. (IBS2 + 0.5*IBS1) / (IBS0 + IBS1 + IBS2) |
PPC | IBS binomial test |
RATIO | HETHET : IBS0 SNP ratio (expected value 2) |
Note that there is one entry per pair of samples, so this file can be very large. The 'gz' modifier causes the output to be gzipped, while 'rel-check' removes pairs of samples with different FIDs, and --min/--max removes lines with PI_HAT values below/above the given cutoff(s).
The 'full' modifier causes the following fields to be added:
IBS0 | Number of IBS 0 nonmissing variants |
IBS1 | Number of IBS 1 nonmissing variants |
IBS2 | Number of IBS 2 nonmissing variants |
HOMHOM | Number of IBS 0 SNP pairs used in PPC test |
HETHET | Number of IBS 2 het/het SNP pairs used in PPC test |
By default, the minimum distance between informative pairs of SNPs used in the pairwise population concordance (PPC) test is 500k base pairs; you can change this with the --ppc-gap flag.
The underlying P(IBD=0/1/2) estimator sometimes yields numbers outside the range [0,1]; by default, these are clipped. The 'unbounded' modifier turns off this clipping. Then, if PI_HAT2 < P(IBD=2), 'nudge' adjusts the final estimates to P(IBD=0) := (1-p2), P(IBD=1) := 2p(1-p), and P(IBD=2) := p2, where p is the current PI_HAT.
This estimator requires fairly accurate minor allele frequencies to work properly. Use --read-freq if you do not think your immediate dataset's empirical MAFs are representative.
--genome jobs can be subdivided with --parallel, which is substantially easier to use than PLINK 1.07 --genome-lists. (Since we are not aware of other practical applications of --genome-lists, that flag has been provisionally retired; contact us if you still need it.)
We may add more sophisticated IBD estimation routine(s) in the future if there is sufficient interest.
--homozyg [{group | group-verbose}] ['consensus-match'] ['extend'] ['subtract-1-from-lengths']
--homozyg-snp <min SNP count>
--homozyg-kb <min length>
--homozyg-density <max inverse density (kb/SNP)>
--homozyg-gap <max internal gap kb length>
--homozyg-het <max hets>
--homozyg-window-snp <scanning window size>
--homozyg-window-het <max hets in scanning window hit>
--homozyg-window-missing <max missing calls in scanning window hit>
--homozyg-window-threshold <min scanning window hit rate>
If any of these flags are present, a set of run-of-homozygosity reports is generated using PLINK 1.07's scanning algorithm. See the original documentation for more details.
- You may also want to try 'bcftools roh', which uses a HMM-based detection method. (We'll include a basic port of that command in PLINK 2.0 if there is sufficient interest.)
- If you're satisfied with all the default settings described below, just use --homozyg with no modifiers. Otherwise, --homozyg lets you change a few binary settings:
- The 'group[-verbose]' modifier adds a report on pools of overlapping runs of homozygosity. (This is triggered by --homozyg-match as well.) 'group-verbose' also produces a detailed report for each pool.
- With 'group[-verbose]', 'consensus-match' causes pairwise segmental matches to be called based only on the SNPs in the entire pool's consensus segment, rather than all the SNPs in the pairwise intersection.
- Due to how the scanning algorithm works, it is possible for a reported run of homozygosity to be adjacent to a few unincluded homozygous variants. This is generally harmless, but if you wish to extend the ROH to include them, use the 'extend' modifier. (Note that the --homozyg-density bound can prevent extension, and --homozyg-gap affects which variants are considered adjacent.)
- By default, segment bp lengths are calculated as (<end bp position> - <start bp position> + 1). This is a minor change from PLINK 1.07, which does not add 1 at the end. For testing purposes, you can use the 'subtract-1-from-lengths' modifier to apply the old formula.
- By default, only runs of homozygosity containing at least 100 SNPs, and of total length ≥ 1000 kilobases, are noted. You can change these minimums with --homozyg-snp and --homozyg-kb, respectively.
- By default, a ROH must have at least one SNP per 50 kb on average; change this bound with --homozyg-density.
- By default, if two consecutive SNPs are more than 1000 kb apart, they cannot be in the same ROH; change this bound with --homozyg-gap.
- By default, a ROH can contain an unlimited number of heterozygous calls; you can impose a limit with --homozyg-het. (This flag was silently ignored by PLINK 1.07.)
- By default, the scanning window contains 50 SNPs; change this with --homozyg-window-snp.
- By default, a scanning window hit can contain at most 1 heterozygous call and 5 missing calls; change these limits with --homozyg-window-het and --homozyg-window-missing, respectively.
- By default, for a SNP to be eligible for inclusion in a ROH, the hit rate of all scanning windows containing the SNP must be at least 0.05; change this threshold with --homozyg-window-threshold.
--homozyg-match <min overlap rate>
--pool-size <min pool size>
In a "--homozyg group[-verbose]" run, pools of overlapping ROH are formed, then pairwise allelic matches within each pool are identified, then allelic-match groups are formed based on these matches. (More precisely, each group has a reference member marked with an appended '*' in the .hom.overlap 'GRP' column, and all other members of the group have pairwise allelic matches with the reference member.) By default, a pairwise match is defined as 0.95 or greater concordance between segments across jointly homozygous variants; you can change this threshold with --homozyg-match.
--pool-size excludes all pools with fewer than the given number of segments from the report(s).
Population stratification >>
|