General usage
Getting started
After downloading and unzipping PLINK 1.9, you should see the main PLINK 1.9 binary, the GPLv3 license, the prettify utility for generating clean space-delimited text tables, and the small files toy.ped and toy.map. Try the command
./plink --file toy --freq --out toy_analysis
You should see something like:
PLINK v1.90b6.9 64-bit (4 Mar 2019) www.cog-genomics.org/plink/1.9/
(C) 2005-2019 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to toy_analysis.log.
Options in effect:
--file toy
--freq
--out toy_analysis
4096 MB RAM detected; reserving 2048 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (2 variants, 2 people).
--file: toy_analysis-temporary.bed + toy_analysis-temporary.bim +
toy_analysis-temporary.fam written.
2 variants loaded from .bim file.
2 people (2 males, 0 females) loaded from .fam.
2 phenotype values loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Calculating allele frequencies... done.
Total genotyping rate is 0.75.
--freq: Allele frequencies (founders only) written to toy_analysis.frq .
(If it fails, you might have downloaded the wrong package for your machine. Double-check the Downloads table; if you're still stumped, our plink2-users Google group may help.)
Okay, what did my command mean? And what just happened?
PLINK 1.9 parses each command line as a collection of flags (each of which starts with two dashes1), plus parameters (which immediately follow a flag, and never start with a dash unless that dash is immediately followed by a digit) for those flags. So the command above included three flags: --file, --freq, and --out. They specify the following three things, which are part of almost every PLINK run:
- Input data: "--file toy" tells PLINK to use the genomic data in the text files toy.ped and toy.map. You'll see several other ways to specify input data on the next page.
- Calculation(s)2 to perform: --freq tells PLINK to generate an allele frequency report. The full range of supported calculations is summarized under "Main functions" in the sidebar, and the formats of all reports are described in the file formats appendix.
- An output file prefix: We'll elaborate on this in a moment.
So this particular combination makes PLINK calculate allele frequencies in toy.ped + toy.map, and write a report to toy_analysis.frq.
If you have PLINK 1.07 installed, try running the same command with it: you should get exactly the same report, down to the last byte. We are aiming for this level of concordance across almost all PLINK 1.07 commands where it might be wanted.
1: Actually, that was a lie. With the exceptions of --1 and --23file, PLINK 1.9 allows you to use a single dash in front of each flag. In exchange for saving you some keystrokes, please do yourself a favor and avoid filenames that begin with a dash.
2: PLINK 1.9 is usually less strict than PLINK 1.07 when it comes to allowing multiple calculations in a single run. See the order of operations page for details.
Interpreting our flag usage summaries
The rest of this documentation has many one-line summaries describing the parameter sets accepted by particular flags, followed by discussions of flag functionality and the effects of optional parameters. We use the following conventions in our one-line usage summaries (these were adjusted in March 2019 to be more consistent with community norms):
- <angle brackets> denote a required parameter, where the text between the brackets describes its nature.
- ['square brackets + single-quotes'] denotes an optional modifier. Use the EXACT text in the quotes; e.g. "--freq gz" is valid given the summary
--freq [{counts | case-control}] ['gz']
- [{bar|separated|braced|bracketed|values}] denotes a collection of mutually exclusive optional modifiers (again, the exact text must be used). When there are no outer square brackets, one of the choices must be selected.
- ['quoted_text='<description of value>] denotes an optional modifier that must begin with the quoted text, and be followed by a value with no whitespace in between. '|' may also be used here to indicate mutually exclusive options. E.g. "--assoc perm" and "--assoc mperm=10000" are both valid, and "--glm perm mperm=10000" invalid, given the summary
--assoc ['perm' | 'mperm='<value>] ...
- [square brackets without quotes or braces] denote an optional parameter, where the text between the brackets describes its nature.
- An ellipsis (...) indicates that you can enter multiple parameters of the specified type.
- Background color summarizes degree of similarity to previously existing functionality. Green signals perfect compatibility: you can use the basic flag in exactly the same manner as you previously have in PLINK 1.07/GCTA/etc. (Note that green does not guarantee the absence of additional options.) Yellow signals slightly different functionality and/or command-line usage, and blue signals that the flag is new to PLINK 1.9.
- If parts of our current implementation are known or strongly suspected to be incomplete, that is signaled with red text. So red text on a green background indicates that we plan to provide perfect compatibility, but we have more coding and/or testing to do before we get there.
If you're already familiar with PLINK, this should help you skim over stuff you already know. If there are just one or two flags you need to look up, you can quickly find what you need in the sidebar; try the search box if the correct page isn't immediately apparent.
For the newer bioinformaticians out there, here's our first full flag description.
Setting the output file prefix
--out <prefix>
By default, the output files generated by PLINK all have names of the form 'plink.<one of these extensions>'. This is fine for a single run, but as soon as you make more use of PLINK, you'll start causing results from previous runs to be overwritten.
Therefore, you usually want to choose a different output file prefix for each run. --out causes 'plink' to be replaced with the prefix you provide. E.g. in the example above, "--out toy_analysis" caused PLINK to create a file named toy_analysis.frq instead of plink.frq.
Since the prefix is a required parameter, invoking --out without it will cause PLINK to quit during command line parsing:
[chrchang:~/plink-ng]$ ./plink --file toy --freq --out
PLINK v1.90b6.9 64-bit (4 Mar 2019) www.cog-genomics.org/plink/1.9/
(C) 2005-2019 Shaun Purcell, Christopher Chang GNU General Public License v3
Error: Missing --out parameter.
For more information, try "plink --help <flag name>" or "plink --help | more".
In the rest of this documentation, we will continue highlighting full command lines in purple, default parameter values in orange, and sample parameter values you can freely change in green.
If you use PLINK 1.9 in any published work, please cite both the software (as an electronic resource/URL):
Package : PLINK [version]
Authors : Shaun Purcell, Christopher Chang
URL : www.cog-genomics.org/plink/1.9/
and the manuscript(s) describing the methods you used. Our primary methods paper is:
Chang CC, Chow CC, Tellier LCAM, Vattikuti S, Purcell SM, Lee JJ (2015) Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience, 4.
PLINK 1.9 includes implementations of many analyses that were developed by other teams. The original sources are summarized below.
- Methods introduced in PLINK 1.0:
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira M, Bender D, Maller J, Sklar P, de Bakker P, Daly MJ, Sham PC (2007) PLINK: A Tool Set for Whole-Genome and Population-Based Linkage Analyses. American Journal of Human Genetics, 81.
- --hardy/--hwe:
Wigginton JE, Cutler DJ, Abecasis GR (2005) A note on exact tests of Hardy-Weinberg equilibrium. American Journal of Human Genetics, 76.
Graffelman J, Moreno V (2013) The mid p-value in exact tests for Hardy-Weinberg equilibrium. Statistical Applications in Genetics and Molecular Biology, 12. (if mid-p adjustment is applied)
- --ld/--blocks:
Gaunt T, Rodríguez S, Day I (2007) Cubic exact solutions for the estimation of pairwise haplotype frequencies: implications for linkage disequilibrium analyses and a web tool 'CubeX'. BMC Bioinformatics, 8.
Taliun D, Gamper J, Pattaro C (2014) Efficient haplotype block recognition of very long and dense genetic sequences. BMC Bioinformatics, 15. (if --blocks is used)
- GRM-related functions:
Yang J, Lee SH, Goddard ME, Visscher PM (2011) GCTA: A Tool for Genome-wide Complex Trait Analysis. American Journal of Human Genetics, 88.
- --assoc/--model permutation test:
Steiß V, Letschert T, Schäfer H, Pahl R (2012) PERMORY-MPI: a program for high-speed parallel permutation testing in genome-wide association studies. Bioinformatics, 28.
- --logistic:
Hill A, Loh PR, Bharadwaj RB, Pons P, Shang J, Guinan E, Lakhani K, Kilty I, Jelinsky SA (2017) Stepwise Distributed Open Innovation Contests for Software Development - Acceleration of Genome-Wide Association Analysis. GigaScience, 6.
- --fast-epistasis:
Wan X, Yang C, Yang Q, Xue H, Fan X, Tang N, Yu W (2010) BOOST: A Fast Approach to Detecting Gene-Gene Interactions in Genome-wide Case-Control Studies. American Journal of Human Genetics, 87.
Ueki M, Cordell HJ (2012) Improved statistics for genome-wide interaction analysis. PLOS Genetics, 8. (if joint-effects and/or variance-corrected original test is used)
- --meta-analysis with 'weighted-z':
Willer CJ, Li Y, Abecasis G (2010) METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics, 26.
Standard data input >>
|