Introduction, downloads

S: 22 Oct 2024 (b.7.7)

D: 22 Oct 2024

Recent version history

What's new?

Future development

Limitations

Note to testers

[Jump to search box]

General usage

Getting started

Citation instructions

Standard data input

PLINK 1 binary (.bed)

Autoconversion behavior

PLINK text (.ped, .tped...)

VCF (.vcf[.gz], .bcf)

Oxford (.gen[.gz], .bgen)

23andMe text

Generate random

Unusual chromosome IDs

Recombination map

Allele frequencies

Phenotypes

Covariates

Clusters of samples

Variant sets

Binary distance matrix

IBD report (.genome)

Input filtering

Sample ID file

Variant ID file

Positional ranges file

Cluster membership

Set membership

Attribute-based

Chromosomes

SNPs only

Simple variant window

Multiple variant ranges

Sample/variant thinning

Covariates (--filter)

Missing genotypes

Missing phenotypes

Minor allele frequencies

Hardy-Weinberg

Mendel errors

Quality scores

Relationships

Main functions

Data management

--make-bed

--recode

--output-chr

--zero-cluster

--split-x/--merge-x

--set-me-missing

--fill-missing-a2

--set-missing-var-ids

--update-map...

--update-ids...

--flip

--flip-scan

--keep-allele-order...

--indiv-sort

--write-covar...

--[b]merge...

Merge failures

VCF reference merge

--merge-list

--write-snplist

--list-duplicate-vars

Basic statistics

--freq[x]

--missing

--test-mishap

--hardy

--mendel

--het/--ibc

--check-sex/--impute-sex

--fst

Linkage disequilibrium

--indep...

--r/--r2

--show-tags

--blocks

Distance matrices

Identity-by-state/Hamming

(--distance...)

Relationship/covariance

(--make-grm-bin...)

--rel-cutoff

Distance-pheno. analysis

(--ibs-test...)

Identity-by-descent

--genome

--homozyg...

Population stratification

--cluster

--pca

--mds-plot

--neighbour

Association analysis

Basic case/control

(--assoc, --model)

Stratified case/control

(--mh, --mh2, --homog)

Quantitative trait

(--assoc, --gxe)

Regression w/ covariates

(--linear, --logistic)

--dosage

--lasso

--test-missing

Monte Carlo permutation

Set-based tests

REML additive heritability

Family-based association

--tdt

--dfam

--qfam...

--tucc

Report postprocessing

--annotate

--clump

--gene-report

--meta-analysis

Epistasis

--fast-epistasis

--epistasis

--twolocus

Allelic scoring (--score)

R plugins (--R)

Secondary input

GCTA matrix (.grm.bin...)

Distributed computation

Command-line help

Miscellaneous

Tabs vs. spaces

Flag/parameter reuse

System resource usage

Pseudorandom numbers

Resources

1000 Genomes

Teaching materials

Gene range lists

Functional SNP attributes

Errors and warnings

Output file list

Order of operations

For developers

GitHub repository

Compilation

Core algorithms

Partial sum lookup

Bit population count

Ternary dot product

Vertical population count

Exact statistical tests

Multithreaded gzip

Adding new functionality

Discussion forums

plink2-users

Credits

File formats

Quick index search

Errors and warnings

When PLINK detects that something is nonstandard and/or wrong, it will usually display and log a message to that effect. In order of increasing severity, there are three classes of such messages: 'Note', 'Warning', and 'Error'.

Notes address situations where nothing is actually wrong, but there's something PLINK thought you might want to know. Common Notes include:

"--xyz flag deprecated. Use ..."
This indicates that the interface for a PLINK 1.07 command you're using has been redesigned, and the new interface probably exposes some handy additional options, but you are free to continue doing things the way you always have. Backwards compatibility will not be dropped in future PLINK 1.9 builds.
"--make-bed input and output filenames match. Appending '~' to input filenames."
Since PLINK 1.9 does not keep all input data in memory simultaneously, it's frequently necessary for it to rename input files when they conflict with output filenames; otherwise the following could happen:
1. First block of input data loaded and filtered
2. Filtered data written to new output file; input file is deleted in the process
3. Attempt to load second block of input data fails; PLINK errors out, and worse, most of the input data has been lost
PLINK 1.9 follows the convention, introduced by GNU Emacs, of using appended tilde characters to designate automatic backup files. (Note that these backup files are fair game for clobbering by future automatic backups; when you want them to serve as 'real' backups, rename them.)
"No phenotypes present."
Not a problem if you aren't performing any association analysis, or if you're explicitly loading phenotype data with --pheno when necessary.

A Warning indicates that something is likely to be wrong, but it's not fatal. Common Warnings include:

"No output requested. Exiting." (followed by basic usage information)
This happens when you specified input file(s) but forgot to mention what should be done with them; figure out what it is that you forgot, and then rerun your command.
"<number> het. haploid genotypes present (see plink.hh )."
This is usually caused by male heterozygous calls in the X chromosome pseudo-autosomal region. Check the variants named in the .hh file; if they are all near the beginning or end of the X chromosome, --split-x should solve the problem.
It can also be caused by incorrect sex information and/or an incorrect chromosome set.
We strongly recommend addressing this warning as soon as you notice it.
"Nonmissing nonmale Y chromosome genotype(s) present."
This implies the presence of incorrect sex information and/or an incorrect chromosome set. Most PLINK operations treat heterozygous haploid and nonmale Y genotypes as missing, but data conversion operations preserve them (so calls aren't lost when e.g. you fix the data file's sex column) unless you override with --set-hh-missing.
"Underscore(s) present in sample IDs."
When using --recode vcf, sample IDs are formed by merging the FID and IID and placing an underscore between them. When the FID or IID already contains an underscore, this may make it difficult to reconstruct them from the VCF file; you may want to replace underscores with a different character in PLINK files (Unix tr is handy here).
"QT --assoc doesn't handle X/Y/MT/haploid variants normally (try --linear)."
"--gxe doesn't currently handle X/Y/MT/haploid variants properly."
These alert you to limitations of the --assoc and --gxe flags. Rerun with --autosome[-xy], and/or use --linear to analyze sex and haploid chromosomes.
"Per-sample --mds-plot can be very slow with over 5000 people. (Consider using the 'by-cluster' modifier.)"
If the run does turn out to be unacceptably slow, try rerunning --mds-plot with 'by-cluster' while creating 5000 or so tight clusters.

Finally, an Error is a fatal problem that causes PLINK to terminate immediately. Common Errors include:

"No input dataset."
The inverse of "No output requested"; rerun your command with the appropriate input flag(s). (This is classified as an Error instead of a Warning because the basic usage info message is less likely to help you fix the problem.)
"--xyz conflicts with another input flag."
You specified multiple input flags of different types. PLINK 1.9 requires you to name exactly one main input fileset; two or more is not allowed. (--bmerge/--merge-list may have the functionality you're looking for.)
"Failed to open <filename>."
You probably mistyped a filename, included a file extension when you shouldn't have (e.g. --bfile), or failed to include a file extension when you should have (e.g. --vcf).
"Line <1-based number> of <filename> is pathologically long."
In many cases, we didn't think it made any sense for a particular type of input file to have a line longer than ~128k characters, so we made PLINK 1.9 error out upon contact with such a line. (This sometimes happens when you enter a wrong filename.) If you have a legitimate need to work with longer lines, let us know and we'll probably revise the relevant command(s) to handle them.
Much more rarely, PLINK 1.9's two-gigabyte "long line" loading buffer (applied when dealing with e.g. .ped or .vcf files) won't be enough. In this event, we humbly suggest that you may want to revise your pipelines to avoid the generation of text files with > 2 GB lines...
"Out of memory. The --memory flag may be helpful."
This is most likely to happen with very large variant sets, 80+ character variant ID(s), very large sample x sample matrix computations, lots of long, fully-spelled-out indels... and silly mistakes on our part (e.g. we just verified you did have enough memory, but we forgot a 'not' in our code and did not have a test for that code branch).

Machines with 32-bit operating systems or ≤ 4 GB RAM struggle with very large variant sets (e.g. the ~40 million tracked by 1000 Genomes phase 1), because PLINK 1.9 still tends to require ~50-70 bytes of memory per variant. In this case, it is usually still best to split the dataset by chromosome, for performance reasons if nothing else (even if PLINK can be given enough memory, there won't be much left for the disk cache). Fortunately, even with ~40 million variants, 4 GB RAM is usually enough to allow PLINK to split the dataset on 64-bit systems, if you close most other programs and use an increased --memory setting.
If you have a super-long sample/variant ID, find a way to shorten it in your PLINK files. (And your other files; many other programs will have problems with this.)
--parallel can split most large matrix computations into pieces small enough to fit into memory. --cluster and --lasso are exceptions, but in both cases ways to greatly reduce memory consumption are already known (so drop us a note if you want us to prioritize their implementation).
If your data contains many long, fully-spelled-out indels, those are currently saved outside the main workspace, and as a result decreasing the --memory setting would actually free up more space for them.
And if the error doesn't seem to make any sense (e.g. the same command works just fine in PLINK 1.07), report the bug and we'll fix it ASAP.

"--xyz requires a case/control phenotype."
You specified a case/control analysis command, but your phenotype data is not case/control. Check if there's a similar command designed for scalar phenotype data (e.g. --regress-distance for --groupdist). If not, you can use --tail-pheno to downcode scalar phenotype data to case/control.
"--xyz requires a scalar phenotype."
The inverse of the previous error. Look for a similar case/control command. If you can't find one, you can pretend your phenotype data is scalar by e.g. changing a '1' to a '0.999999' in the .fam file, but we do not recommend this for any serious work.
"All people removed ..."
"All variants removed ..."
The filtering flags you specified caused every last sample or every last variant to be excluded from the analysis. Check for things done backwards (e.g. --extract where --exclude was intended) and mistyped thresholds.
"<number> variants with 3+ alleles present. ..."
See the Merge failures documentation.

Output file list >>