Introduction, downloads

D: 14 Nov 2024

Recent version history

What's new?

Coming next

[Jump to search box]

General usage

Getting started

Flag usage summaries

Column set descriptors

Citation instructions

Standard data input

PLINK 1 binary (.bed)

PROVISIONAL_REF?

PLINK 2 binary (.pgen)

Autoconversion behavior

VCF/BCF (.vcf[.gz], .bcf)

Oxford genotype (.bgen)

Oxford haplotype (.haps)

PLINK 1 text (.ped, .tped)

PLINK 1 dosage

Sample ID conversion

Dosage import settings

Generate random

Unusual chromosome IDs

Allele frequencies

Phenotypes

Covariates

'Cluster' import

Reference genome (.fa)

Input filtering

Sample ID file

Variant ID file

Interval-BED file

--extract-col-cond

QUAL, FILTER, INFO

Chromosomes

SNPs only

Simple variant window

Multiple variant ranges

Deduplicate variants

Sample/variant thinning

Pheno./covar. condition

Missingness

Category subset

--keep-col-match

Missing genotypes

Number of distinct alleles

Allele frequencies/counts

Hardy-Weinberg

Imputation quality

Sex

Founder status

Main functions

Data management

--make-[b]pgen/--make-bed

--export

--output-chr

--split-par/--merge-par

--set-all-var-ids

--recover-var-ids

--update-map...

--update-ids...

--ref-allele

--ref-from-fa

--normalize

--indiv-sort

--write-covar

--variance-standardize

--quantile-normalize

--split-cat-pheno

--pheno-svd

--pmerge[-list]

--write-samples

Basic statistics

--freq

--geno-counts

--sample-counts

--missing

--genotyping-rate

--hardy

--het

--check-sex/--impute-sex

--fst

--pgen-info

Pairwise diffs

--pgen-diff

--sample-diff

Linkage disequilibrium

--indep...

--r[2]-[un]phased

--ld

Sample-distance matrices

Relationship/covariance

  (--make-grm-bin...)

--make-king...

--king-cutoff

Population stratification

--pca

PCA projection

Association analysis

--glm

--glm ERRCODE values

--gwas-ssf

--adjust-file

Report postprocessing

--clump

Linear scoring

--score[-list]

--variant-score

Distributed computation

Command-line help

Miscellaneous

Flag/parameter reuse

System resource usage

--loop-cats

.zst decompression

Pseudorandom numbers

Warnings as errors

.pgen validation

Resources

1000 Genomes phase 3

HGDP-CEPH

FASTA files

Errors and warnings

Output file list

Order of operations

Developer information

GitHub root

Python library

R library

Compilation

Adding new functionality

Discussion forums

Credits

File formats

Tutorials

Setup

Rules of Thumb

Data Exploration 1 — HWE, Allele Frequency Spectrum

Data Exploration 2 — Genomic Structure

Linkage

Relationship Matrix

Genome-Wide Assocation Analyses (GWAS)

Regressions

Post-Hoc

Formatting Files

bcftools

Variant IDs

Reference Alleles

Format for R

Shortcuts

Quick index search

File format reference

This page describes specialized PLINK 2.0 input and output file formats which are identifiable by file extension. (Most extensions not listed here have very simple one-entry-per-line or two-entry-per-line text formats.)

Unless otherwise specified, all multicolumn text files generated by PLINK 2.0 are tab-delimited, with one header line starting with '#'. In the column summaries, columns which are present unless removed by the column set descriptor are boldface, and columns which only appear under some data/flag/modifier combination(s) are italicized.

Jump to: .acount | .adjusted | .afreq | .allele.no.snp | .bcf | .bed | .bgen | .bim | .bins | .clumps | .cov | .eigenvec{,.allele|.var} | .fam | .fst.summary | .fst.var | .gcount | .gen | .glm.firth | .glm.linear | .glm.logistic[.hybrid] | .grm | .grm.N.bin | .grm.bin | .haps | .hardy | .hardy.x | .het | .*.id | .kin0 | .king[.bin] | .legend | .map | .pdiff | .ped | .pgen{,.pgi} | .phy | .psam | .pvar | .raw | .rel[.bin] | .sample | .scount | .sdiff | .sdiff.summary | .sexcheck | .smiss | .sscore | .ssf.tsv | .svd.pheno | .svd.pheno_wts | .tfam | .tped | .traw | .vcf | .vcor | .vcor{1|2}[.bin] | .vmiss | .vscore | .vscore.bin


.acount, .afreq (allele count/frequency report)

Produced by --freq.

A text file with a header line, and then one line per variant with the following columns:

HeaderColumn setContents
CHROMchromChromosome code
POSposBase-pair coordinate
ID(required)Variant ID
REFrefReference allele
ALT1alt1Alternate allele 1
ALTaltAll alternate alleles, comma-separated
PROVISIONAL_REF?maybeprovref, provrefReports whether REF allele is provisional
'REF_FREQ'/'REF_CT'reffreqReference allele frequency/dosage
'ALT1_FREQ'/'ALT1_CT'alt1freqAlternate allele 1 frequency/dosage
'ALT_FREQS'/'ALT_CTS'altfreq, alteq, alteqzComma-separated freqs/dosages for all alts; 'eq' requests '1=<ALT1 value>,2=<ALT2 value>,...' formatting with zero-values omitted, 'eqz' includes zeroes
'ALT_NUM_{FREQS,CTS}'altnumeqComma-separated freqs/dosages for all alts
'FREQS'/'CTS'freq, eq, eqzComma-separated freqs/dosages for all alleles
'NUM_FREQS'/'NUM_CTS'numeqComma-separated freqs/dosages for all alleles
MACH_R2machr2MaCH imputation quality metric
MINIMAC3_R2minimac3r2Minimac3 phased-dosage imputation quality metric; inaccurate unless phased dosages were imported with e.g. "--vcf dosage=HDS" (dosage=DS is not enough)
OBS_CTnobsNumber of allele observations

.adjusted (basic multiple-testing corrections)

Produced by --adjust[-file].

A text file with a header line, and then one line per tested allele with the following columns:

HeaderColumn setContents
CHROMchromChromosome code
POSposBase-pair coordinate
ID(required)Variant ID
REFrefReference allele
ALT1alt1Alternate allele 1
ALTaltAll alternate alleles, comma-separated
PROVISIONAL_REF?maybeprovref, provrefReports whether REF allele is provisional
A1a1Tested allele
[NEG_LOG10_]UNADJunadjUnadjusted p-value
[NEG_LOG10_]GCgcDevlin & Roeder (1999) genomic control corrected p-value (additive model only)
QQqqP-value quantile.
[NEG_LOG10_]BONFbonfBonferroni correction
[NEG_LOG10_]HOLMholmHolm-Bonferroni (1979) adjusted p-value
[NEG_LOG10_]SIDAK_SSsidakssŠidák single-step adjusted p-value
[NEG_LOG10_]SIDAK_SDsidaksdŠidák step-down adjusted p-value
[NEG_LOG10_]FDR_BHfdrbhBenjamini & Hochberg (1995) step-up false discovery control
[NEG_LOG10_]FDR_BYfdrbyBenjamini & Yekutieli (2001) step-up false discovery control

Entries are sorted in increasing p-value order. (Thus, if the QQ field is present, its values just increase linearly.)


.allele.no.snp (allele mismatch report)

Produced by --update-alleles when there are too many mismatches between the loaded alleles for a variant and the old-allele column(s) of the --update-alleles input file..

A text file with no header line, and one line per mismatching variant with the following three fields:

  1. Variant identifier
  2. Expected allele #1 (from --update-alleles input file)
  3. Remaining expected alleles, comma-separated; or "." if none
.bcf (binary Variant Call Format)

Variant information + sample ID + genotype call binary file. Imported with --bcf, and produced by "--export bcf".

Refer to the hts-specs GitHub repository for a detailed description of the format. "--export bcf" uses binary encoding v2.2.


.bed (PLINK 1 binary biallelic genotype table)

PLINK 1's preferred way to represent genotype calls. Must be accompanied by .bim and .fam files. Loaded with --bfile, and generated by --make-bed.

Do not confuse this with the UCSC Genome Browser's BED format, which is totally different. (It is safe to change a PLINK 1 .bed file's extension to .pgen and use --bpfile to load it.)

See the PLINK 1.9 documentation for a detailed description of the usual variant-major form, along with an example. PLINK 2 can also efficiently export the sample-major form ("--export ind-major-bed"); it has third byte equal to zero instead of one, but is otherwise analogous.


.bgen (Oxford variant info + genomic data binary file)

Native binary file format for Oxford statistical genetics tools, such as IMPUTE2 and SNPTEST. BGEN v1.1 files should always be accompanied by a .sample file. Loaded with --bgen, and produced by "--export bgen-1.{1,2,3}".

Refer to https://www.chg.ox.ac.uk/~gav/bgen_format/ for a detailed description of the format.


.bim (PLINK extended MAP file)

Variant information file accompanying a .bed or biallelic .pgen binary genotype table. (--make-just-bim can be used to update just this file.)

A text file with no header line, and one line per variant with the following six fields:

  1. Chromosome code
  2. Variant ID
  3. Position in centimorgans (safe to use dummy value of '0')
  4. Base-pair coordinate (1-based; limited to 231-2)
  5. ALT ("A1" in PLINK 1.x) allele code
  6. REF ("A2" in PLINK 1.x) allele code

A few notes:

  • Yes, the ALT column comes before the REF column in a .bim file.
  • When .bed files are involved, the ALT and REF allele codes will sometimes be swapped, since that's PLINK 1.x's default behavior whenever the true REF allele is less common than the ALT allele in the current dataset. If that's a problem, you can use --ref-allele to swap them back.
  • It is safe to change a .bim file's extension to .pvar and use --pfile to load it.
  • Variants with negative bp coordinates are ignored by PLINK.
  • PLINK 1.9 and 2.0 permit the centimorgan column to be omitted. (However, omission is not recommended if the .bim file needs to be read by other software.)

.bins (allele count or frequency histogram)

A text file with a header line, followed by one line per [start, end) histogram bin with the following two fields:

HeaderContents
BIN_STARTStart of bin
OBS_CTNumber of variants in the bin

The end of the current bin interval is the next line's BIN_START value (or positive infinity if there is no next line).


.clumps (reprocessed LD-clumped reports)

Produced by --clump.

A text file with a header line, and one line per index variant (lowest p-values first) with the following fields:

HeaderColumn setContents
CHROMchromChromosome code
POSposBase-pair coordinate
ID(required)Variant ID
REFrefReference allele
ALT1alt1Alternate allele 1
ALTaltAll alternate alleles, comma-separated
PROVISIONAL_REF?maybeprovref, provrefReports whether REF allele is provisional
A1maybea1, a1Tested allele
Fmaybef, f1-based file number
[NEG_LOG10_]P(required)Index variant p-value (or -log10(p))
TOTALtotalNumber of other variants in clump
CLUMP_FIRST_POSmaybebounds, boundsPOS of first member with p < --clump-p2 threshold
CLUMP_LAST_POSmaybebounds, boundsPOS of last member with p < --clump-p2 threshold
NONSIGbinsNumber of clumped variants with p ≥ [highest p-value boundary]
S<bin boundary>, ...binsNumber of clumped variants with [lower boundary] ≤ p < [this boundary]
SP2sp2Comma-delimited IDs, and possibly A1 allele and/or file number, of members with p < --clump-p2 threshold.
RANGES(with --clump-range[0])Comma-separated list of overlapping ranges

S<bin boundary> columns are in decreasing-p-value order, and the bin-boundary component of the column names no longer omit the leading "0.".

.cov (covariate table)

Produced by --write-covar, --make-[b]pgen/--make-bed, and --export when covariates have been loaded/specified. Valid input for --covar.

A text file with a header line, and one line per sample with the following columns:

HeaderColumn setContents
FIDmaybefid, fidFamily ID
IID(required)Individual ID
SIDmaybesid, sidSource ID
PATmaybeparents, parentsPaternal individual ID
MATmaybeparents, parentsMaternal individual ID
SEXsexSex (1 = male, 2 = female, 'NA' = unknown)
PHENO1pheno1All-missing phenotype column, if none loaded
<Pheno name>, ...pheno1, phenosPhenotype value(s) (only first if just 'pheno1')
<Covar name>, ...(required)Covariate values

(Note that --covar can also be used with files lacking a header row.)


.eigenvec, .eigenvec.allele, .eigenvec.var (principal components)

Produced by --pca. Accompanied by an .eigenval file, which contains one eigenvalue per line.

The .eigenvec file is a text file with a header line and between 1+V and 3+V columns per sample, where V is the number of requested principal components. The first columns contain the sample ID, and the rest are principal component scores in the same order as the .eigenval values (with column headers 'PC1', 'PC2', ...).

With the 'allele-wts' modifier, an .eigenvec.allele file is also generated. It's a text file with a header line, followed by one line per allele with the following columns:

HeaderColumn setContents
CHROMchromChromosome code
POSposBase-pair coordinate
ID(required)Variant ID
REFrefReference allele
ALT1alt1Alternate allele 1
ALTaltAll alternate alleles, comma-separated
PROVISIONAL_REF?maybeprovref, provrefReports whether REF allele is provisional
A1(required)Current allele
AXaxOther alleles, comma-separated
PC1, PC2, ...(required)Principal component allele scores

Alternatively, with the 'biallelic-var-wts' modifier, an old-style .eigenvec.var file is generated. It's a text file with a header line, followed by one line per variant with the following columns:

HeaderColumn setContents
CHROMchromChromosome code
POSposBase-pair coordinate
ID(required)Variant ID
REFrefReference allele
ALT1alt1Alternate allele 1
ALTaltAll alternate alleles, comma-separated
PROVISIONAL_REF?maybeprovref, provrefReports whether REF allele is provisional
MAJmajMajor allele
NONMAJnonmajAll nonmajor alleles, comma separated
PC1, PC2, ...(required)Principal component variant weights; signs are w.r.t. the major allele

.fam (PLINK sample information file)

Sample information file accompanying a .bed or biallelic .pgen binary genotype table. (--make-just-fam can be used to update just this file.)

A text file with no header line, and one line per sample with the following six fields:

  1. Family ID ('FID')
  2. Individual ID ('IID'; cannot be '0')
  3. Individual ID of father ('0' if father isn't in dataset)
  4. Individual ID of mother ('0' if mother isn't in dataset)
  5. Sex code ('1' = male, '2' = female, '0' = unknown)
  6. Phenotype value ('1' = control, '2' = case, '-9'/'0'/non-numeric = missing data if case/control)

.fst.summary (all-population-pairs Wright's FST report)

Produced by --fst.

A text file with a header line, and then one line per population-pair with the following columns:

HeaderColumn setContents
POP1(required)First population ID
POP2(required)Second population ID
OBS_CTnobsNumber of variants with valid FST estimates
'HUDSON_FST'/'WC_FST'(required)Between-population FST estimate
SE(required)Standard error of FST estimate, if blocksize= specified

.fst.var (per-variant Wright's FST report for one population pair)

Produced by --fst when 'report-variants' is specified. A separate file is generated for each population pair.

A text file with a header line, and then one line per autosomal variant with the following columns:

HeaderColumn setContents
CHROMchromChromosome code
POSposBase-pair coordinate
ID(required)Variant ID
REFrefReference allele
ALTaltAll alternate alleles, comma-separated
PROVISIONAL_REF?maybeprovref, provrefReports whether REF allele is provisional
OBS_CTnobsNumber of (nonmissing) genotype observations across population pair
POP1_ALLELE_CTnalleleNumber of nonmissing allele observations in first population
POP2_ALLELE_CTnalleleNumber of nonmissing allele observations in second population
FST_NUMERfstfracNumerator of FST estimate
FST_DENOMfstfracDenominator of FST estimate
'HUDSON_FST'/'WC_FST'fstWright's FST estimate

.gcount (genotype count report)

Produced by --geno-counts.

A text file with a header line, and then one line per variant with the following columns:

HeaderCol. setContents
CHROMchromChromosome code
POSposBase-pair coordinate
ID(required)Variant ID
REFrefReference allele
ALT1alt1Alternate allele 1
ALTaltAll alternate alleles, comma-separated
PROVISIONAL_REF?maybeprovref, provrefReports whether REF allele is provisional
HOM_REF_CThomrefHomozygous-ref count
HET_REF_ALT1_CTrefalt1Heterozygous ref-alt1 count
HET_REF_ALT_CTSrefaltComma-separated het ref-altx counts
HOM_ALT1_CThomalt1Homozygous-alt1 count
TWO_ALT_GENO_CTSaltxyComma-separated altx-alty counts, in (1/1)-(1/2)-(2/2)-(1/3)-... order
DIPLOID_GENO_CTSxySimilar to altxy, except reference allele included
HAP_REF_CThaprefHaploid-ref count
HAP_ALT1_CThapalt1Haploid-alt1 count
HAP_ALT_CTShapaltComma-separated haploid-altx counts
HAP_CTShapSimilar to hapalt, except ref also included
GENO_NUM_CTSnumeq"0/0=<hom ref ct>,0/1=<het ref-alt1>,...,0=<hap ref>" etc.; zero-counts are omitted; '.' if all genotypes missing
MISSING_CTmissingNumber of missing genotypes
OBS_CTnobsNumber of (nonmissing) genotype observations

.gen (Oxford text genotype file format)

Native text genotype file format for Oxford statistical genetics tools, such as IMPUTE2 and SNPTEST. Should always be accompanied by a .sample file. Imported with --data/--gen, and produced by "--export oxford[-v2]".

A text file with no header line, and one line per variant with either 3N+5 or 3N+6 fields where N is the number of samples. Each line stores information for a single SNP.

In the 3N+5 case (corresponding to the original specification), the first five fields are:

  1. "SNP ID"
  2. rsID (treated by PLINK as the main variant ID)
  3. Base-pair coordinate
  4. Allele 1 (usually minor, use 'ref-first' when importing to treat as REF)
  5. Allele 2 (usually major, use 'ref-last' when importing to treat as REF)

Unless the chromosome code was declared with --oxford-single-chr (in which case the SNP ID column is ignored), PLINK has no choice but to assume that the "SNP ID" column actually stores chromosome codes. (This is the convention when PLINK exports a 5-leading-column .gen file.)

The newer 3N+6 column flavor has a dedicated chromosome column in front. This was not supported by PLINK 1.9 or 2.0 before 16 Apr 2021.

Each subsequent triplet of values then indicate likelihoods of homozygote A1, heterozygote, and homozygote A2 genotypes at this variant, respectively, for one sample. If they add up to less than one, the remainder is a no-call probability weight.

The PLINK 2 binary format can represent allele count expected values, but it does not distinguish between e.g. {P(hom-ref)=0.28, P(het)=0.52, P(hom-alt)=0.2} and {P(hom-ref)=0.08, P(het)=0.92, P(hom-alt)=0}, and it ignores the no-call probability weight (though "0 0 0" will be correctly converted to a missing call). The --import-dosage-certainty flag can be used during import to replace some of the most uncertain genotype calls with missing values.


.glm.firth, .glm.logistic[.hybrid] (logistic/Firth regression association statistics)

Produced by --glm with a case/control phenotype.

A text file with a header line, and then one line per variant with the following columns:

HeaderColumn setContents
CHROMchromChromosome code
POSposBase-pair coordinate
ID(required)Variant ID
REFrefReference allele
ALT1alt1Alternate allele 1
ALTaltAll alternate alleles, comma-separated
PROVISIONAL_REF?maybeprovref, provrefReports whether REF allele is provisional
A1(required)Counted allele1 in regression
OMITTEDomittedOmitted allele
A1_CT2a1countTotal A1 allele count (can be decimal with dosage data)
ALLELE_CT2totalleleAllele observation count
A1_CASE_CT2a1countccA1 count in cases
A1_CTRL_CT2a1countccA1 count in controls
CASE_ALLELE_CT2totalleleccCase allele observation count
CTRL_ALLELE_CT2totalleleccControl allele observation count
CASE_NON_A1_CTgcountccCase genotypes with 0 copies of A1
CASE_HET_A1_CTgcountccCase genotypes with 1 copy of A1
CASE_HOM_A1_CTgcountccCase genotypes with 2 copies of A1
CTRL_NON_A1_CTgcountccControl genotypes with 0 copies of A1
CTRL_HET_A1_CTgcountccControl genotypes with 1 copy of A1
CTRL_HOM_A1_CTgcountccControl genotypes with 2 copies of A1
A1_FREQa1freqA1 allele frequency
A1_CASE_FREQa1freqccA1 allele frequency in cases
A1_CTRL_FREQa1freqccA1 allele frequency in controls
MACH_R2machr2MaCH imputation quality metric
FIRTH?firthReports whether Firth reg. was used ('firth-fallback' only)
TESTtestTest identifier
OBS_CTnobsNumber of samples in regression
BETAbetaRegression coefficient (for A1 allele)
ORorbetaOdds ratio (for A1 allele)
[LOG(OR)_]SEseStandard error of log-odds (i.e. beta)
L##ciBottom of symmetric approx. confidence interval (with --ci)
U##ciTop of symmetric approx. confidence interval (with --ci)
Z_[OR_F_]STATtzF-statistic for joint test, Wald Z-score for logistic/Firth regression
[NEG_LOG10_]PpAsymptotic p-value (or -log10(p)) for Z/chisq-stat
ERRCODEerrWhen result is 'NA', an error code describing the reason

All statistics are computed across just the samples used in the regression.


.glm.linear (linear regression association statistics)

Produced by --glm with a quantitative phenotype.

A text file with a header line, and then one line per variant with the following columns:

HeaderColumn setContents
CHROMchromChromosome code
POSposBase-pair coordinate
ID(required)Variant ID
REFrefReference allele
ALT1alt1Alternate allele 1
ALTaltAll alternate alleles, comma-separated
PROVISIONAL_REF?maybeprovref, provrefReports whether REF allele is provisional
A1(required)Counted allele1 in regression
OMITTEDomittedOmitted allele
A1_CT2a1countTotal A1 allele count (can be decimal with dosage data)
ALLELE_CT2totalleleAllele observation count
A1_FREQa1freqA1 allele frequency
MACH_R2machr2MaCH imputation quality metric
TESTtestTest identifier
OBS_CTnobsNumber of samples in regression
BETAbeta, orbetaRegression coefficient (for A1 allele)
SEseStandard error of log-odds (i.e. beta)
L##ciBottom of symmetric approx. confidence interval (with --ci)
U##ciTop of symmetric approx. confidence interval (with --ci)
T_[OR_F_]STATtzF-statistic for joint test; t-statistic for linear regression
[NEG_LOG10_]PpAsymptotic p-value (or -log10(p)) for T/chisq-stat
ERRCODEerrWhen result is 'NA', an error code describing the reason

All statistics are computed across just the samples used in the regression.

1: For multiallelic variants, this column may contain multiple comma-separated alleles when the result doesn't depend on which allele is A1.
2: For males on chrX, these values are normally computed as if males were diploid, since that's the encoding used in the regression. The exception is when "--xchr-model 1" is specified, where male 0..1 values coexist with female 0..2 values in the regression. In that case, these columns will also be based on the mixed male 0..1, female 0..2 scaling.
To be clear, --glm only uses this 0..2 haploid coding on chrX, to put males and females on an equal footing in a world where X-inactivation is common. chrY/chrM use 0..1 coding.


.grm (GCTA text relationship matrix)

Produced by --make-grm-list.

A text file with no header line, and one line per pair of samples (not necessarily distinct) with the following four fields:

  1. 1-based index of first sample in .grm.id file
  2. 1-based index of second sample in .grm.id file
  3. Number of observations (variants where neither sample has a missing call)
  4. Relationship value

.grm.N.bin, .grm.bin (GCTA 1.1+ triangular binary relationship matrix)

Produced by --make-grm-bin.

These files contain single-precision (4-byte) floating point values. Using 1-based matrix indices, the first value in each file is the (1, 1) relationship value (.grm.bin) or observation count (.grm.N.bin); the second and third values are the (2, 1) and (2, 2) relationships/counts; the fourth through sixth values are the (3, 1), (3, 2) and (3, 3) relationships/counts in that order; and so on.

Note that .grm.bin files generated by GCTA versions before 1.1 have a different format.


.haps (Oxford phased haplotype file)

Reference panel haplotype file format for IMPUTE2. Must be accompanied by a .legend file when no variant info header columns are present. Imported with --haps, and produced by "--export haps[legend]".

A text file with no header line, and either 2N+5 or 2N fields where N is the number of samples. In the former case, the first five columns are:

  1. Chromosome code
  2. Variant ID
  3. Base-pair coordinate
  4. Allele 0 (usually minor, use 'ref-first' when importing to treat as REF)
  5. Allele 1 (usually major, use 'ref-last' when importing to treat as REF)

This is followed by a pair of 0/1-valued haplotype columns for the first sample, then a pair of haplotype columns for the second sample, etc. (For male samples on chrX, the second column may contain dummy '-' entries; otherwise, missing genotype calls are not permitted.)


.hardy (Hardy-Weinberg equilibrium exact test report)

Produced by --hardy when autosomal diploid variants are present.

A text file with a header line, and one line per autosomal diploid variant with the following columns:

HeaderColumn setContents
CHROMchromChromosome code
POSposBase-pair coordinate
ID(required)Variant ID
REFrefReference allele
ALT1alt1Alternate allele 1
ALTaltAll alternate alleles, comma-separated
PROVISIONAL_REF?maybeprovref, provrefReports whether REF allele is provisional
A1(required)Tested allele
AXaxNon-A1 alleles, comma-separated
HOM_A1_CTgcountsHomozygous-A1 genotype count
HET_A1_CTgcountsHeterozygous-A1 genotype count
TWO_AX_CTgcounts# of nonmissing calls with no A1 copies
GCOUNTSgcount1colgcounts values in a single comma-separated column
O(HET_A1)hetfreqObserved heterozygous-major frequency
E(HET_A1)hetfreqExpected heterozygous-major frequency
[NEG_LOG10_][MID]PpHardy-Weinberg equilibrium exact test [mid-]p-value (or -log10(p))

.hardy.x (Graffelman-Weir extended chrX HWE test report)

Produced by --hardy when chrX variants are present.

A text file with a header line, and one line per chrX variant with the following columns:

HeaderColumn setContents
CHROMchromChromosome code
POSposBase-pair coordinate
ID(required)Variant ID
REFrefReference allele
ALT1alt1Alternate allele 1
ALTaltAll alternate alleles, comma-separated
PROVISIONAL_REF?maybeprovref, provrefReports whether REF allele is provisional
A1(required)Tested allele
AXaxNon-A1 alleles, comma-separated
FEMALE_HOM_A1_CTgcountsFemale homozygous-A1 genotype count
FEMALE_HET_A1_CTgcountsFemale heterozygous-A1 genotype count
FEMALE_TWO_AX_CTgcounts# of nonmissing female calls with no A1 copies
MALE_A1_CTgcountsMale A1 allele count
MALE_AX_CTgcountsMale non-A1 allele count
GCOUNTSgcount1colgcounts values in a single comma-separated column
O(FEMALE_HET_A1)hetfreqObserved het-A1 frequency
E(FEMALE_HET_A1)hetfreqExpected het-A1 frequency
FEMALE_A1_FREQsexafFemale A1 allele frequency
MALE_A1_FREQsexafMale A1 allele frequency
FEMALE_ONLY_[NEG_LOG10_][MID]PfemalepOld female-only HWE exact test [mid-]p-value (or -log10(p))
[NEG_LOG10_][MID]PpGraffelman-Weir HWE test [mid-]p-value (or -log10(p))

.het (method-of-moments F coefficient estimates)

Produced by --het.

A text file with a header line, and one line per sample with the following columns:

HeaderColumn setContents
FIDmaybefid, fidFamily ID
IID(required)Individual ID
SIDmaybesid, sidSource ID
O(HOM)homObserved number of homozygous genotypes
E(HOM)homExpected number of homozygous genotypes
O(HET)hetObserved number of heterozygous genotypes
E(HET)hetExpected number of heterozygous genotypes
OBS_CTnobsNumber of (nonmissing, non-monomorphic) autosomal genotype observations
FfMethod-of-moments F coefficient estimate

.id (Sample ID list)

When generated by PLINK 2, this is a text file which may or may not have a header line. If there's no header line (default with .grm.id files, can be forced for other .id files with --no-id-header), and there's a single column, they are IIDs; if there are two columns, they are FID/IID. Otherwise, there's one line per sample after the header line with the following columns:

HeaderContents
FIDFamily ID (present iff .psam or --update-ids file has it)
IIDIndividual ID (always present)
SIDSource ID (present iff .psam or --update-ids file has it)

.kin0 (KING-robust kinship coefficient report)

Produced by --make-king-table.

A text file with a header line, and one line per sample pair with kinship coefficient no smaller than the --king-table-filter value. When --king-table-filter is not specified, all sample pairs are included. The following columns are present:

HeaderColumn setContents
FID1maybefid, fidFID of first sample in current pair
ID1idIID of first sample in current pair
SID1maybesid, sidSID of first sample in current pair
FID2maybefid, fidFID of second sample in current pair
ID2idIID of second sample in current pair
SID2maybesid, sidSID of second sample in current pair
NSNPnsnpNumber of variants considered (autosomal, neither call missing)
HETHEThethetProportion/count of considered call pairs which are het-het
IBS0ibs0Proportion/count of considered call pairs which are opposite homs
HET1_HOM2ibs1Proportion/count of sample 1 het, sample 2 hom
HET2_HOM1ibs1Proportion/count of sample 1 hom, sample 2 het
KINSHIPkinshipKING-robust between-family kinship estimate

.king[.bin] (KING-robust kinship coefficient matrix)

Produced by --make-king. Accompanied by a .king[.bin].id file containing sample IDs.

If text, a tab-delimited file that is either lower-triangular (excluding the diagonal) or square. If it's square, the upper-right triangle may be either zeroed out or the mirror-image of the lower-left triangle, depending on whether the 'square0' or 'square' modifier was used.

The binary format is semantically identical; it just has nothing but single- (4-byte) or double-precision (8-byte) floating point values, instead of text+delimiters+linebreaks.


.legend (Oxford single-chromosome variant information file)

Single-chromosome variant information file accompanying a bare .haps reference panel haplotype file. Imported with --legend, and produced by "--export hapslegend".

A text file with a header line, and one line per variant with the following four columns:

HeaderContents
idVariant ID
positionBase-pair coordinate
a0Allele 0 (usually minor, use 'ref-first' to treat as REF)
a1Allele 1 (usually major, use 'ref-last' to treat as REF)

.map (PLINK 1 text fileset variant information file)

Variant information file accompanying a .ped text pedigree + genotype table.

A text file with no expected header line, and one line per variant with the following 3-4 fields:

  1. Chromosome code. PLINK 1.9 and 2.0 also permit contig names here, but most older programs do not.
  2. Variant ID
  3. Position in centimorgans (optional; safe to use dummy value of '0')
  4. Base-pair coordinate (1-based; limited to 231-2)

All lines must have the same number of columns (so either no lines contain the centimorgans column, or all of them do).

Lines starting with '#' are supposed to be treated as comments, but this was not consistently supported by PLINK 1.9 and 2.0 before Aug 2024.


.pdiff (two-fileset genotype/dosage discordance report)

Produced by --pgen-diff.

A text file with a header line, and then one line per discordance with the following columns:

HeaderColumn setContents
CHROMchromChromosome code
POSposBase-pair coordinate
IDidVariant ID
REFrefReference allele
ALTaltAll alternate alleles, comma-separated
PROVISIONAL_REF?maybeprovref, provrefReports whether REF allele is provisional
FIDmaybefid, fidFamily ID
IID(required)Individual ID
SIDmaybesid, sidSource ID
'GT1'/'DS1'genoGenotype/dosage of first sample
'GT2'/'DS2'genoGenotype/dosage of second sample

.ped (PLINK 1/MERLIN/Haploview sample-major text genotype table)

Pedigree information + genotype call text file. Must be accompanied by a .map file. Loaded with --pedmap, and produced by "--export ped". This format is simultaneously highly inefficient, even relative to other text formats, and limited in scope (unobserved minor allele codes can't be stored); continued use is strongly discouraged.

Contains no header line, and one line per sample with 2V+6 fields where V is the number of variants. The first six fields are the same as those in a .fam file. The seventh and eighth fields are allele calls for the first variant in the .map file ('0' = no call); the 9th and 10th are allele calls for the second variant; and so on. All variants must be biallelic (or monomorphic, or all-missing).

If all alleles are single-character, PLINK 1.9 and 2.0 will correctly parse the more compact "compound genotype" variant of this format, where each genotype call is represented as a single two-character string. This does not require the use of an additional loading flag. You can produce such a file with "--export compound-genotypes".

It is also possible to load .ped files missing some initial fields.

Lines starting with '#' are supposed to be treated as comments, but this was not supported by PLINK 1.9 and 2.0 before Aug 2024.


.pgen, .pgen.pgi (PLINK 2 binary genotype table)

PLINK 2's preferred way to represent genotype calls. Must be accompanied by .pvar/.bim and .psam/.fam files. Loaded with --pfile/--bpfile, and generated with --make-pgen/--make-bpgen and all import commands.

Most .pgen files have an embedded index, and do not have an accompanying .pgen.pgi file. When the index is not embedded, PLINK 2 expects it to be stored in "<.pgen filename>.pgi".

A draft specification of these formats is available. The first version will be finalized around the beginning of PLINK 2.0 beta testing.


.psam (PLINK 2 sample information file)

Sample information file accompanying a .pgen binary genotype table. (--make-just-psam can be used to update just this file.)

A text file which usually has at least one header line, where only the last header line starts with '#FID' or '#IID'. This final header line specifies the columns in the .psam file; the following intermediate column headers are recognized:

  1. IID (individual ID; required)
  2. SID (source ID, when there are multiple samples for the same individual)
  3. PAT (individual ID of father, '0' if unknown)
  4. MAT (individual ID of mother, '0' if unknown)
  5. SEX ('1' = male, '2' = female, 'NA'/'0' = unknown)

(FID must either be the first column, or absent. If it's absent, all FID values are now assumed to be '0'.) Any other value is treated as a phenotype/covariate name; see the phenotype/covariate documentation for column encoding details.

If no header line is present, the columns are assumed to be in .fam file order (FID, IID, PAT, MAT, SEX, PHENO1).


.phy (relaxed PHYLIP format)

Multiple sequence alignment text file, produced by "--export phylip[-phased]", and recognized by FastTree, IQ-TREE, and several other phylogenetic tools. This format cannot be loaded by PLINK.

The header line contains two numbers, the number of sequences followed by the number of nucleotide codes per sequence.

Each subsequent line contains two fields. The first field contains the sample ID, and is padded by spaces to a fixed width, such that the longest sample ID is followed by exactly 3 spaces. (This imitates the behavior of vcf2phylip.) The second field contains IUPAC nucleotide codes.


.pvar (PLINK 2 variant information file)

Variant information file accompanying a .pgen binary genotype table. (--make-just-pvar can be used to update just this file.)

A text file which usually has at least one header line, where only the last header line starts with '#CHROM'. This final header line specifies the columns in the .pvar file; the following intermediate column headers are recognized:

  1. POS (base-pair coordinate)
  2. ID (variant ID; required)
  3. REF (reference allele)
  4. ALT (alternate alleles, comma-separated)
  5. QUAL (phred-scaled quality score for whether the locus is variable at all)
  6. FILTER ('PASS', '.', or semicolon-separated list of failing filter codes)
  7. INFO (semicolon-separated list of flags and key-value pairs, with types declared in header)
  8. FORMAT (terminates header line parsing)
  9. CM (centimorgan position)

In particular, a VCF file, or a trimmed VCF file with all columns past the 5th (or 6th, etc.) removed, is valid input for anything expecting a .pvar-format file.

The following VCF-style header lines are also recognized:

  1. "##INFO=<ID=PR,Number=0,Type=Flag...": Indicates the INFO/PR flag, which marks 'provisional' reference alleles (i.e. imported from a file which does not consistently track which allele is reference and which are alternates), is present. (This information is also present in .pgen files, and the loader reports an error when the .pvar and .pgen flags don't match.)
  2. "##chrSet=...": Explicitly specifies the chromosome set. E.g. --make-pgen + --dog will cause "##chrSet=<ID=1,autosomePairCt=38,X,Y,XY,M>" to be written to the .pvar header, and as a consequence it isn't necessary to include the --dog flag when loading the new fileset.

When no header line is present, the columns are assumed to be in .bim file order (CHROM, ID, CM, POS, ALT, REF; or if only 5 columns are present, CM is assumed to be omitted).


.raw (additive + dominant component file)

Produced by "--export {A,AD}"; suitable for loading from R. This format cannot be loaded by PLINK.

A text file with a header line, and then one line per sample with V+6 (for "--export A") or 2V+6 (for "--export AD") fields, where V is the number of variants. The header line does not contain a preceding '#'. The first six fields are:

FIDFamily ID
IIDIndividual ID
PATPaternal individual ID
MATMaternal individual ID
SEXSex (1 = male, 2 = female, 0 = unknown)
PHENOTYPEFirst active non-categorical phenotype (missing value if none)

This is followed by one or two fields per variant:

<Variant ID>_<counted allele>Allelic dosage (missing = 'NA', haploid scaled to 0..2)
<Variant ID>_HETDominant component (1 = het). Requires "--export AD".

If 'include-alt' was specified, the header line also names alternate allele codes in parentheses, e.g. 'rs5939319_G(/A)'.


.rel[.bin] (relationship matrix)

Produced by --make-rel. Accompanied by a .rel[.bin].id file containing sample IDs.

Contents are identical to that of a .grm/.grm.bin file. Possible shapes are essentially the same as for .king files; the only difference is that .king files have an omitted or constant-0.5 diagonal while .rel files do not.


.sample (Oxford sample information file)

Sample information file accompanying a .gen or .bgen genotype dosage file, or a .haps phased reference panel. Loaded with --data/--sample, and produced by --export in several cases.

By default, the .sample space-delimited files emitted by --export have two header lines, and then one line per sample with 4+ fields:

First header lineSecond header lineSubsequent contents
ID_10Family ID
ID_20Individual ID
missing0Missing call frequency
sexDSex code ('1' = male, '2' = female, '0' = unknown)
<Pheno name>, ...'B'/'D'/'P'Binary ('0' = control, '1' = case), discrete (categorical, positive integers), or continuous phenotype; missing values represented by 'NA'

(As of 6 Apr 2021, PLINK 2 accepts 'C' as a synonym for column type 'P' in .sample input files.)

With --export's 'sample-v2' modifier, this is adjusted to:

First header lineSecond header lineSubsequent contents
ID0Sample ID
missing0(unchanged)
fatherDPaternal individual ID
motherDMaternal individual ID
sexDUnknown sex encoded as 'NA' instead of '0'
<Pheno name>, ...'B'/'D'/'P'For type 'D', original category names are saved instead of just integers; otherwise unchanged

Note that older programs are likely to support only the first .sample dialect.

A specification for this format is on the QCTOOL v2 website.


.scount (sample variant-count report)

Produced by --sample-counts.

A text file with a header line, and then one line per discordance with the following columns:

HeaderColumn setContents
FIDmaybefid, fidFamily ID
IID(required)Individual ID
SIDmaybesid, sidSource ID
SEXsexSex (1 = male, 2 = female, 'NA' = unknown)
HOM_CThomHomozygous genotype count
HOM_REF_CThomrefHom-REF genotype count
HOM_ALT_CThomaltHom-ALT genotype count
HOM_ALT_SNP_CThomaltsnpHom-ALT SNP (single-character REF and ALT) count
HET_CThetHeterozygous genotype count
HET_REF_ALT_CTrefaltHet. REF-ALTx genotype count
HET_2ALT_CThet2altHet. ALTx-ALTy genotype count
HET_SNP_CThetsnpHet. SNP genotype count
DIPLOID_TRANSITION_CTdiptsDiploid SNP transition (A↔G, C↔T) count
TRANSITION_CTtsSNP transition count
DIPLOID_TRANSVERSION_CTdiptvDiploid SNP transversion count
TRANSVERSION_CTtvSNP transversion count
DIPLOID_NONSNP_NONSYMBOLIC_CTdipnonsnpsymbDiploid non-SNP, non-symbolic variant count
NONSNP_NONSYMBOLIC_CTnonsnpsymbNon-SNP, non-symbolic variant count
SYMBOLIC_CTsymbolicSymbolic (starting with '<') variant count
NONSNP_CTnonsnpNon-SNP variant count
DIPLOID_SINGLETON_CTdipsingleNumber of singletons relative to this dataset, considering just diploid calls3
SINGLETON_CTsingleNumber of singletons relative to this dataset
HAP_REF_INCL_FEMALE_Y_CThaprefwfemaleyHaploid REF count, counting chrY for everyone
HAP_REF_CThaprefHaploid REF count, excluding chrY for nonmales
HAP_ALT_INCL_FEMALE_Y_CThapaltwfemaleyHaploid ALT count, counting chrY for everyone
HAP_ALT_CThapaltHaploid ALT count, excluding chrY for nonmales
MISSING_INCL_FEMALE_Y_CTmissingwfemaleyMissing call count, counting chrY for everyone
MISSING_CTmissingMissing call count, excluding chrY for nonmales

The 'hetsnp', 'dipts'/'ts'/'diptv'/'tv', 'dipnonsnpsymb'/'nonsnpsymb', 'symbolic', and 'nonsnp' columns count each ALT allele in a heterozygous ALTx-ALTy genotype separately, since they can be of different subtypes. (I.e. if they are of the same subtype, the corresponding count is incremented by 2.) As a consequence, these columns are unaffected by variant split/join.

3: If the ALT allele in a chrX biallelic variant appears in exactly one female and one male, that counts as a singleton in this column for just the female.


.sdiff (sample-pair discordance report)

Produced by --sample-diff.

A text file with a header line, and then one line per discordance with the following columns:

HeaderColumn setContents
CHROMchromChromosome code
POSposBase-pair coordinate
ID(required)Variant ID
REFrefReference allele
ALTaltAll alternate alleles, comma-separated
PROVISIONAL_REF?maybeprovref, provrefReports whether REF allele is provisional
FID1maybefid, fidFID of first sample in current pair
IID1idIID of first sample in current pair
SID1maybesid, sidSID of first sample in current pair
FID2maybefid, fidFID of second sample in current pair
IID2idIID of second sample in current pair
SID2maybesid, sidSID of second sample in current pair
'GT1'/'DS1'genoGenotype/dosage of first sample
'GT2'/'DS2'genoGenotype/dosage of second sample

.sdiff.summary (sample-pair discordance count summary)

Produced by --sample-diff.

A text file with a header line, and then one line per sample pair with the following columns:

FID1maybefid, fidFID of first sample in current pair
IID1(required)IID of first sample in current pair
SID1maybesid, sidSID of first sample in current pair
FID2maybefid, fidFID of second sample in current pair
IID2(required)IID of second sample in current pair
SID2maybesid, sidSID of second sample in current pair
OBS_CTnobsNumber of genotype/dosage pairs considered
IBS_OBS_CTnobsibsNumber of diploid hardcall-pairs
IBS0_CTibs0# of diploid hardcall-pairs with no matching alleles
IBS1_CTibs1# of diploid hardcall-pairs with exactly 1 matching allele
IBS2_CTibs2# of diploid hardcall-pairs with 2 matching alleles
HALFMISS_CThalfmiss# of genotype/dosage pairs with exactly 1 missing call
DIFF_CTdiff# of genotype/dosage discordances

.sexcheck (sex imputation report)

Produced by --check-sex/--impute-sex.

A text file with a header line, and one line per sample with the following columns:

HeaderColumn setContents
FIDmaybefid, fidFamily ID
IID(required)Individual ID
SIDmaybesid, sidSource ID
PEDSEXpedsexSex code in input file (1 = male, 2 = female, NA = unknown)
SNPSEX(required)Imputed sex code (1/2/NA)
STATUSstatus'OK' on nonmissing PEDSEX and SNPSEX match, 'PROBLEM' otherwise
FxfIf chrX used, inbreeding coefficient estimated off chrX
YCOUNTycountIf chrY used, number of valid chrY genotypes
YRATEyrateIf chrY used, chrY valid genotype rate
YOBSyobsIf chrY used, number of chrY variants considered

.smiss (sample-based missing data report)

Produced by --missing.

A text file with a header line, and then one line per variant with the following columns:

HeaderColumn setContents
FIDmaybefid, fidFamily ID
IID(required)Individual ID
SIDmaybesid, sidSource ID
MISS_PHENO1misspheno1First active phenotype missing (Y/N), Y if none
<Pheno name>, ...missphenosY/N column for each loaded phenotype
MISSING_DOSAGE_CTnmissdosageNumber of missing dosages
MISSING_CTnmissNumber of missing hardcalls, not counting het haploids
MISSING_AND_HETHAP_CTnmisshhNumber of missing hardcalls, counting het haploids
HETHAP_CThethapNumber of heterozygous haploid hardcalls
OBS_CTnobsDenominator (# samples, females excluded on chrY)
F_MISS_DOSAGEfmissdosageMissing dosage rate
F_MISSfmissMissing hardcall rate, not counting het haploids
F_MISS_AND_HETHAPfmisshhMissing hardcall rate, counting het haploids

When dosages are present, MISSING_DOSAGE_CT will typically be slightly lower than MISSING_CT, since hardcalls normally aren't saved for dosages in (0.1, 0.9) or (1.1, 1.9).


.sscore (sample scores)

Produced by --score and --score-list.

A text file with a header line, and then one line per sample with the following columns:

HeaderColumn setContents
FIDmaybefid, fidFamily ID
IID(required)Individual ID
SIDmaybesid, sidSource ID
PHENO1pheno1All-missing phenotype column, if none loaded
<Pheno name>, ...pheno1, phenosPhenotype value(s) (only first if just 'pheno1')
ALLELE_CTnalleleNumber of alleles across scored variants (--score only)
DENOMdenomDenominator used for score average (--score only)
NAMED_ALLELE_DOSAGE_SUMdosagesumSum of named allele dosages (--score only)
<Score name>_AVG, ...scoreavgsScore averages
<Score name>_SUM, ...scoresumsScore sums

.ssf.tsv (association statistics in GWAS-SSF format)

Produced by --gwas-ssf postprocessing --glm output.

A text file with a header line, and then one line per variant with the following columns:

HeaderContents
chromosomeChromosome code (1-25, where X=23, Y=24, MT=25)
base_pair_locationBase-pair coordinate
effect_alleleCounted allele in regression
other_alleleOmitted allele
'beta'/'odds_ratio'Regression coefficient or odds ratio for effect_allele
standard_errorStandard error of beta
effect_allele_frequencyFrequency of effect_allele in regression
[neg_log_10_]p_valueAsymptotic p-value or -log10(p)
variant_id<chrom>_<pos>_<ref>_<alt> variant ID
rsidrsID
ci_upperUpper end of beta/odds_ratio confidence interval
ci_lowerLower end of beta/odds_ratio confidence interval
nNumber of samples in regression
ref_alleleIndicates which allele is REF ('EA', 'OA', or '#NA')

(Since the --gwas-ssf command does not have a cols= modifier, boldface is used to denote mandatory GWAS-SSF fields in this table.)


.svd.pheno (summary phenotypes generated via SVD)

Produced by --pheno-svd.

A text file with a header line, and then one line per sample with the following columns:

HeaderColumn setContents
FIDmaybefid, fidFamily ID
IID(required)Individual ID
SIDmaybesid, sidSource ID
SVDPHENO1, ...(required)New phenotype values
.svd.pheno_wts (singular values and right-singular vectors from phenotype SVD)

Produced by --pheno-svd.

A text file with a header line, and then one line per new phenotype with the following columns:

HeaderColumn setContents
NEW_PHENO_IDidNew phenotype ID
SINGULAR_VALUEsvSingular value from SVD
<Old pheno name>, ...(required)Right-singular vectors from SVD
.tfam (PLINK 1 sample information file)

Sample information file accompanying a .tped file; identical format to .fam files.


.tped (PLINK 1 variant-major text genotype table)

Variant information + genotype call text file. Must be accompanied by a .tfam file. Loaded with --tfile, and produced by "--export tped".

Contains no header line, and one line per variant with 2N+4 fields where N is the number of samples. The first four fields are the same as those in a .map file. The fifth and sixth fields are allele calls for the first sample in the .tfam file ('0' = no call); the 7th and 8th are allele calls for the second sample; and so on. All variants must be biallelic (or monomorphic, or all-missing).


.traw (variant-major additive component file)

Produced by "--export Av"; suitable for loading from R. Loaded with --import-dosage (note that several modifiers must be specified).

A text file with a header line without a leading '#', and then one line per variant with the following N+6 fields (where N is the number of samples):

CHRChromosome code
SNPVariant identifier
(C)MPosition in centimorgans
POSBase-pair coordinate
COUNTEDCounted allele (now defaults to REF)
ALTOther allele(s), comma-separated
<FID>_<IID>...Allelic dosages (missing = 'NA', haploid scaled to 0..2)

.used_sites.tsv (variant information for relaxed-PHYLIP file)

Produced by "--export phylip[-phased] used-sites". Accompanied by a .phy file.

A text file with a header line, and then one line per variant with the following 3 fields:

CHROMChromosome code
POSBase-pair coordinate
NUM_SAMPLESNumber of samples with nonmissing nucleotides
.vcf, .bcf (1000 Genomes Project text Variant Call Format)

Variant information + sample ID + genotype call file; text if .vcf, binary if .bcf. Imported with --vcf/--bcf, and produced by "--export {b,v}cf".

Note that, while PLINK 2.0 supports a much larger subset of the VCF standard than PLINK 1.9, it still isn't appropriate for general-purpose VCF handling. Instead, the goal is to provide a very useful complement to bcftools. For example, PLINK 2.0 does not save per-call read depths, so any data management or analysis which requires them to be kept around should be done with bcftools or a similarly general tool; but once you're done with variant calling/imputation and are ready to treat your data as a single matrix of hardcalls or dosages (possibly with missing entries), PLINK 2.0 is much more efficient.

The VCFv4.3 files emitted by "--export vcf" start with the following three header lines:

  1. ##fileformat=VCFv4.3
  2. ##fileDate=<yyyymmdd date>
  3. ##source=PLINKv2.00

This is usually followed by all the VCF header lines (if any) present in the loaded .pvar file, a "##chrSet=" chromosome set description when appropriate, and additional "##contig=", INFO/PR, and FORMAT header lines when necessary to make the file conform to the VCF standard.

Next comes a tab-delimited header line with the following N+9 fields (where N is the number of samples), and one tab-delimited line per variant with the same fields:

#CHROMChromosome code
POSBase-pair coordinate
IDVariant identifier
REFReference allele (missing = 'N')
ALTAll alternate alleles, comma-separated (missing = '.')
QUALPhred-scaled quality score for whether the locus is variable at all
FILTER'PASS', '.', or semicolon-separated list of failing filter codes
INFOSemicolon-separated list of flags and key-value pairs, with types declared in header
FORMAT'GT', 'DS', 'HDS', and/or 'GP' can be emitted by PLINK 2
<Sample ID>, ...Genotype/dosage calls

Allele codes are supposed to either start with '<', only contain characters in the set {A,C,G,T,N,a,c,g,t,n}, be an isolated '*', or represent a breakend. --export issues a warning if an allele code does not satisfy this restriction.

The full VCFv4.3 specification is in the hts-specs GitHub repository; this includes details on the BCF binary encoding.


.vcor (LD-statistic report)

Produced by --r[2]-[un]phased when in its default tabular-output mode.

A text file with a header line, and one line per variant-pair passing all filters. The following columns are present:

HeaderColumn setContents
CHROM_AchromChromosome code for first variant in pair
POS_AposBase-pair coordinate of first variant in pair
ID_AidID of first variant in pair
REF_Aref4Reference allele for first variant in pair
ALT1_Aalt1Alternate allele 1 for first variant in pair
ALT_AaltComma-separated alternate alleles for first variant in pair
PROVISIONAL_REF_A?maybeprovref, provrefReports whether REF_A allele is provisional
MAJ_Amaj4Major allele for first variant in pair
NONMAJ_AnonmajComma-separated nonmajor alleles for first variant in pair
NONMAJ_FREQ_Afreq(1 - <major-allele frequency>) for first variant in pair
CHROM_BchromChromosome code for second variant in pair
POS_BposBase-pair coordinate of second variant in pair
ID_BidID of second variant in pair
REF_Bref4Reference allele for second variant in pair
ALT1_Balt1Alternate allele 1 for second variant in pair
ALT_BaltComma-separated alternate alleles for second variant in pair
PROVISIONAL_REF_B?maybeprovref, provrefReports whether REF_B allele is provisional
MAJ_Bmaj4Major allele for second variant in pair
NONMAJ_BnonmajComma-separated nonmajor alleles for second variant in pair
NONMAJ_FREQ_Bfreq(1 - <major-allele frequency>) for second variant in pair
[UN]PHASED_R[2](required)Variant correlation coefficient
DdLinkage disequilibrium D (phased only)
DPRIMEdprimeLewontin's D' (phased only)
ABS_DPRIMEdprimeabsAbsolute value of Lewontin's D' (phased only)

Sign of [UN]PHASED_R, D, and DPRIME is positive when the major (or, with 'ref-based', REF) alleles are positively correlated.

4: The 'maj' (or 'ref' when the 'ref-based' modifier is specified) column-set is included by default in --r-phased and --r-unphased's tabular output, but excluded by default for --r2-phased and --r2-unphased.


.vcor{1|2}[.bin] (variant-correlation matrix)

Produced by --r[2]-[un]phased when in matrix-output mode; the exact file extension distinguishes phased vs. unphased (which appears in the component before '.vcor1' or '.vcor2'), r vs. r2, and text vs. binary format. Accompanied by a <matrix filename>.vars file containing variant IDs.

Possible shapes are the same as for .king files, except that triangular files include the diagonal.


.vmiss (variant-based missing data report)

Produced by --missing.

A text file with a header line, and then one line per variant with the following columns:

HeaderColumn setContents
CHROMchromChromosome code
POSposBase-pair coordinate
ID(required)Variant ID
REFrefReference allele
ALTaltAll alternate alleles, comma-separated
PROVISIONAL_REF?maybeprovref, provrefReports whether REF allele is provisional
MISSING_DOSAGE_CTnmissdosageNumber of missing dosages
MISSING_CTnmissNumber of missing hardcalls, not counting het haploids
MISSING_AND_HETHAP_CTnmisshhNumber of missing hardcalls, counting het haploids
HETHAP_CThethapNumber of heterozygous haploid hardcalls
OBS_CTnobsDenominator (# variants for males, excludes chrY for females)
F_MISS_DOSAGEfmissdosageMissing dosage rate
F_MISSfmissMissing hardcall rate, not counting het haploids
F_MISS_AND_HETHAPfmisshhMissing hardcall rate, counting het haploids
F_HETHAPfhethapHeterozygous haploid rate.

When dosages are present, MISSING_DOSAGE_CT will typically be slightly lower than MISSING_CT, since hardcalls normally aren't saved for dosages in (0.1, 0.9) or (1.1, 1.9).


.vscore (text variant score report)

Produced by --variant-score.

A text file with a header line, and then one line per sample with the following columns:

HeaderColumn setContents
CHROMchromChromosome code
POSposBase-pair coordinate
ID(required)Variant ID
REFrefReference allele
ALT1alt1Alternate allele 1
ALTaltAll alternate alleles, comma-separated
PROVISIONAL_REF?maybeprovref, provrefReports whether REF allele is provisional
ALT_FREQaltfreqALT total-frequency used for mean-imputation
MISSING_CTnmissNumber of missing (and thus mean-imputed) dosages
OBS_CTnobsNumber of (nonmissing) sample observations
<Variant score>, ...(required)Variant scores

.vscore.bin (binary variant scores)

Produced by "--variant-score bin". Accompanied by .vscore.cols and .vscore.vars text files containing column (score) and row (variant ID) labels, respectively.

A matrix of double-precision (8-byte) floating point variant scores.

Tutorial Setup >>