Report postprocessing
--clump ['zs'] ['cols='<col set desc.>] <PLINK report filename(s)...>
--clump-p1 <index variant p-value threshold>
--clump-p2 <SP2 column p-value threshold>
--clump-r2 <r^2 threshold>
--clump-kb <clump kb radius>
--clump-unphased
--clump-log10 ['input-only' | 'output-only']
--clump-log10-p1 <-log10(index variant p-value threshold)>
--clump-log10-p2 <-log10(SP2 column p-value threshold)>
--clump-bins <p-value bin boundaries...>
--clump-id-field <field name(s)...>
(alias: --clump-snp-field)
--clump-p-field <field name(s)...>
(alias: --clump-field)
--clump-a1-field [field name(s)...]
--clump-test-field [field name(s)...]
--clump-force-a1
--clump-test <test name(s)...>
--clump-allow-overlap
--clump-range <filename>
--clump-range0 <filename>
--clump-range-border <kbs>
When there are multiple significant association p-values in the same region, LD should be taken into account when interpreting the results. The --clump command is designed to help with this.
--clump loads the named PLINK-format association report(s) (text files with a header line, a column containing variant IDs, and another column containing p-values) and groups results into LD-based clumps, writing a new report to plink2.clumps[.zst]. Multiple filenames can be separated by spaces or commas.
- Clumps are formed around central "index variants" which, by default, must have p-value no larger than 0.0001; change this threshold with --clump-p1. Index variants are chosen greedily starting with the lowest p-value. Variants which meet the --clump-p1 threshold, but have already been assigned to another clump, do not start their own clumps.
- Sites which are less than 250 kb away from an index variant and have r2 larger than 0.5 with it are assigned to that index variant's clump (unless they have been previously been assigned to another clump, and --clump-allow-overlap is not in effect). These two thresholds can be changed with --clump-kb and --clump-r2, respectively.
- By default, the r2 values computed by --clump are haplotype-based; maximum likelihood haplotype frequency estimates are applied to unphased data. Use --clump-unphased to change this to unphased r2; the resulting correlation coefficients are less accurate measures of LD, but they are more accurate measures of --glm genotype-column similarity (since --glm also doesn't use phase information).
- When dosages are present, they are now used in the r2 computation.
- As usual, only founders are considered in the r2 computation. If your dataset has a shortage of them, --make-founders may come in handy.
- By default, a p-value histogram is given for each clump, with default bin boundaries 0.0001,0.001,0.01,0.05. You can control the bin boundaries with --clump-bins; provide a comma- or space-separated sequence of increasing numbers.
- Sites within the clump which have association p-value smaller than 0.01 are listed in the 'SP2' column of the main report, and their span is used for --clump-range[0]. This threshold can be adjusted with --clump-p2.
- Given a gene region file with 1-based coordinates, --clump-range causes overlaps between regions and clumps to be reported. --clump-range0 does the same with 0-based input coordinates.
- Overlaps are now reported in the main .clumps[.zst] file; this is a change from PLINK 1.x.
- With either flag, --clump-range-border extends each region's bounds by the given number of kilobases.
- By default, variant IDs are expected to be in the 'ID' column, or if that's absent, 'SNP'. You can change this with the --clump-id-field flag, which takes a space-delimited sequence of field names to search for. With multiple field names, earlier names take precedence over later ones. (The other --clump-...-field flags work the same way.)
- --clump-log10 specifies -log10(p) rather than raw p-value input/output. The 'input-only' and 'output-only' modifiers let you convert from one format to the other.
- By default, p-values are expected to be in the 'P' column (or, with --clump-log10, 'LOG10_P' and 'NEG_LOG10_P' are also recognized); change this with --clump-p-field.
- Multiallelic variants are effectively split in this computation. This requires the input file(s) to contain an effect-allele column; by default, this is expected to be 'A1', but you can change this with --clump-a1-field. A1 alleles aren't normally checked or reported for biallelic variants (since they usually don't affect p-values), but you can change that with --clump-force-a1.
- Entries in the SP2 column are now of the form <variant ID>[(A1 allele)][(file number)]. The A1 component normally only appears for multiallelic variants; --clump-force-a1 changes this. The file-number component now only defaults to appearing when more than one input file is provided; this can be changed with "cols=+f".
- By default, if there is a 'TEST' column, only lines where the test value is 'ADD' are considered. (This is a change from PLINK 1.x.) These default values can be changed with --clump-test-field and --clump-test, respectively.
- By default, no variant may belong to more than one clump; remove this restriction with --clump-allow-overlap.
- When variant IDs with p ≤ --clump-p1 threshold are present in a --clump input file, but missing from the main dataset, they are now written to plink2.clumps.missing_id[.zst]. When (variant ID, A1 allele) pairs with with p ≤ --clump-p1 threshold in the --clump input file are ignored due to the A1 allele being missing from the main dataset (note that A1 is only checked for multiallelic variants and --clump-force-a1), they are written to plink2.clumps.missing_allele[.zst].
- We have provisionally retired --clump's other bells and whistles; contact us if this is a problem.
The PLINK 1.07 documentation has more discussion of these flags, including a few detailed examples.
Linear scoring >>
|