S: 22 Oct 2024 (b.7.7) D: 22 Oct 2024 Main functions (--distance...) (--make-grm-bin...) (--ibs-test...) (--assoc, --model) (--mh, --mh2, --homog) (--assoc, --gxe) (--linear, --logistic) Core algorithms Quick index search |
MiscellanyTabs vs. spacesBy default, old flags usually produce space-delimited output with an attempt at equal column widths1, while new flags produce tab-delimited output. When a report is not formatted the way you want, the Unix tr command and our prettify utility may come in handy. (Some systems also have a column utility which is similar to prettify.) tr, with no flags, replaces all instances of one character with another character on a one-for-one basis. So, cat plink.dist | tr '\t' ' ' > plink.dist.spaces makes plink.dist.spaces a copy of plink.dist with all tabs converted to spaces, and cat plink.dist.spaces | tr ' ' '\t' > plink.dist2 converts spaces to tabs instead. When converting spaces to tabs, you'll frequently want to collapse strings of consecutive spaces down to single tabs; this is what tr's -s flag is for. E.g. cat plink.genome | tr -s ' ' '\t' > plink.genome.tabs (To also strip leading and trailing spaces, you can use something like the two-sed pipeline mentioned in the help text below.) Finally, it can be useful to expand single tabs to multiple spaces in a column-aligned manner, for e.g. easier text reading/editing. Not all systems provide a nice way to do this, so PLINK 1.9 is distributed with the prettify utility for the job. [chrchang:~/plink-ng]$ prettify It is not actually necessary to first convert spaces to tabs if you just wish to clean up a misaligned space-delimited file. [chrchang:~/plink-ng]$ prettify -ir plink.genome As with many other Unix programs, '-ir' is acceptable shorthand for "-i -r". 1: PLINK 1.07's pretty-printing logic is a bit buggy, but changing output formats can be less safe than just leaving things as they are. So, for now, we've decided to make it easy for you to realign reports on your own instead. (However, we plan to convert practically all functions over to tab-delimited output in PLINK 2.0.) Flag/parameter reuse--script <filename> --script loads the specified text file and applies all the command-line flags and parameters contained within. This is handy if you use the same QC filters across multiple runs and datasets. --rerun loads the specified PLINK 1.9 log (defaulting to plink.log) and causes all commands to be rerun. The same parameter(s) will be used for each flag, except when the same flag is included on the current command line with different parameter(s). Version information--version --version causes PLINK to only print its version number before exiting. Console output suppression--silent --silent prevents PLINK from printing regular output to the console. (The usual logging will still occur, and error-output is not suppressed.) --gplink currently has a similar effect, but it should only be used by gPLINK. (If gPLINK is updated in the future, its developers may change this flag's behavior.) System resource usage--memory <main workspace size, in MB> By default, PLINK 1.9 tries to reserve half of your system's RAM for its main workspace. If this amount is insufficient for your current job, or if it causes unwanted interference with other running processes (e.g. you're using GNU parallel to run single-threaded instances of PLINK on each chromosome simultaneously), you can use --memory to adjust this behavior. 32-bit PLINK limits workspace size to roughly 2 GB. There are a few items (most notably, multi-character allele names) which are saved outside the main workspace. As a result, there are corner cases where decreasing the --memory parameter may enable a run to complete. (This situation is unlikely, since PLINK 1.9 explicitly reserves 64 MB of non-workspace memory.) --threads <max> By default, multithreaded PLINK functions employ about as many concurrent threads as your system has available logical cores. (More precisely, PLINK currently sets the maximum thread count to sysconf(_SC_NPROCESSORS_ONLN), minus 1 if that number is greater than 8. This is a bit arbitrary, but we've found it to work well in practice so far.) Occasionally, you'll want to change this number—perhaps sysconf() is reporting an inaccurate number (not uncommon with AMD processors), or some of your cores are already fully occupied with other tasks. This can be done with --threads. --threads has one known limitation: some BLAS/LAPACK linear algebra operations are multithreaded in a way that PLINK cannot control. If this is problematic, you should recompile against single-threaded BLAS/LAPACK. Name range delimiter--d <delimiter> By default, PLINK commands accepting multiple name ranges (e.g. --snps, --covar-name, --lasso-select-covars, --ld-snps) expect ranges to be denoted with a single dash, with no space on either side of the dash. E.g. in --snps rs1111-rs2222, rs3333, rs4444 'rs1111-rs2222' denotes all variants between rs1111 and rs2222 inclusive. --d lets you designate a non-dash character for this purpose, which can be essential if your IDs contain dashes. E.g. --d : --snps SNP_A-8395068:SNP_A-8303431 tells --snps to act on all variants betwen SNP_A-8395068 and SNP_A-8303431 inclusive. Reproducible pseudorandom number sequences--seed <integer...> --perm-batch-size <value> --seed initializes the pseudorandom number generator with the given seed(s). Each seed must be a 32-bit unsigned integer (i.e. between 0 and 4294967295 inclusive). When performing a permutation test on a quantitative trait, using --linear/--logistic, or conducting a set-based test, --perm-batch-size sets the number of permutations in each pass. (The current default is 512 across all systems, but we may vary it in a system- and/or dataset-dependent fashion in the future for performance reasons.) Due to the technical details of how PLINK generates permutations when employing multiple threads, you may need to use --perm-batch-size, --threads, and --seed together to ensure reproducible results. (For case/control --assoc/--model permutation tests, --threads + --seed is currently adequate.) Note that you may also need to retrieve an older version of PLINK in order to reproduce a run. Faster but less reproducible linear algebra--native By default, when the same plink binary is run with the same flags, workspace size, thread count, and random seed, the results should be reproducible across machines with different processors. (This was not necessarily true on Linux before 19 Oct 2020.) To allow Intel MKL to use processor-dependent code paths that can yield slightly different linear algebra results, add the --native flag. P-value underflow--output-min-p <threshold> By default, p-values too small to be represented by an ordinary floating-point number are reported as '0' or 'INF'. This can create problems for log(p) plots and the like. One workaround is --output-min-p, which prevents PLINK from reporting non-empirical p-values below the given threshold. (Other reported statistics are not affected, so you can e.g. infer the true p-value from the reported Z-statistic.) Reliable logging--debug Normally, PLINK 1.9 does not force log entries to be written to disk immediately. However, when PLINK crashes unexpectedly (e.g. via segmentation fault), this may cause the log to be incomplete. --debug prevents this from happening. Redundant flagsThe following PLINK 1.07 flags have been retired, since they are redundant with omnipresent utilities. (Talk to your system administrator if no programs for handling these operations appear to be installed on your machine.) --compress (use e.g. "gzip <filename>") |