Introduction, downloads

D: 6 Dec 2024

Recent version history

What's new?

Coming next

[Jump to search box]

General usage

Getting started

Flag usage summaries

Column set descriptors

Citation instructions

Standard data input

PLINK 1 binary (.bed)

PROVISIONAL_REF?

PLINK 2 binary (.pgen)

Autoconversion behavior

VCF/BCF (.vcf[.gz], .bcf)

Oxford genotype (.bgen)

Oxford haplotype (.haps)

PLINK 1 text (.ped, .tped)

PLINK 1 dosage

Sample ID conversion

Dosage import settings

Generate random

Unusual chromosome IDs

Allele frequencies

Phenotypes

Covariates

'Cluster' import

Reference genome (.fa)

Input filtering

Sample ID file

Variant ID file

Interval-BED file

--extract-col-cond

QUAL, FILTER, INFO

Chromosomes

SNPs only

Simple variant window

Multiple variant ranges

Deduplicate variants

Sample/variant thinning

Pheno./covar. condition

Missingness

Category subset

--keep-col-match

Missing genotypes

Number of distinct alleles

Allele frequencies/counts

Hardy-Weinberg

Imputation quality

Sex

Founder status

Main functions

Data management

--make-[b]pgen/--make-bed

--export

--output-chr

--split-par/--merge-par

--set-all-var-ids

--recover-var-ids

--update-map...

--update-ids...

--ref-allele

--ref-from-fa

--normalize

--indiv-sort

--write-covar

--variance-standardize

--quantile-normalize

--split-cat-pheno

--pheno-svd

--pmerge[-list]

--write-samples

Basic statistics

--freq

--geno-counts

--sample-counts

--missing

--genotyping-rate

--hardy

--het

--check-sex/--impute-sex

--fst

--pgen-info

Pairwise diffs

--pgen-diff

--sample-diff

Linkage disequilibrium

--indep...

--r[2]-[un]phased

--ld

Sample-distance matrices

Relationship/covariance

  (--make-grm-bin...)

--make-king...

--king-cutoff

Population stratification

--pca

PCA projection

Association analysis

--glm

--glm ERRCODE values

--gwas-ssf

--adjust-file

Report postprocessing

--clump

Linear scoring

--score[-list]

--variant-score

Distributed computation

Command-line help

Miscellaneous

Flag/parameter reuse

System resource usage

--loop-cats

.zst decompression

Pseudorandom numbers

Warnings as errors

.pgen validation

Resources

1000 Genomes phase 3

HGDP-CEPH

FASTA files

Errors and warnings

Output file list

Order of operations

Developer information

GitHub root

Python library

R library

Compilation

Adding new functionality

Discussion forums

Credits

File formats

Tutorials

Setup

Rules of Thumb

Data Exploration 1 — HWE, Allele Frequency Spectrum

Data Exploration 2 — Genomic Structure

Linkage

Relationship Matrix

Genome-Wide Assocation Analyses (GWAS)

Regressions

Post-Hoc

Formatting Files

bcftools

Variant IDs

Reference Alleles

Format for R

Shortcuts

Quick index search

Variant IDs

Significance

In some cases we need to change the name of variant IDs. For example, when there are multiple variants. See Linkage tutorial for how we changed the IDs using set-all-var-ids. Later in our analyses, we may want to recover the original IDs.

Objective

Show how to recover the original variant IDs using a .pvar/VCF/.bim file with the original IDs.


Let's recover rs IDs after LD pruning using recover-var-ids. We will use ./data/processed/all_hg38_qcd_LE1 from a prior tutorial. If you do not have this, see Tutorial Shortcuts.

We will randomly grab 10 SNPs from the prior renamed dataset (10 is for illustration purposes; the operation is still fast for even larger sets). First we will grab some variants with the altered IDs. We will then compare to when the IDs are recovered.

Plink 2:

time plink2 \ --pfile ./data/processed/all_hg38_qcd_LE1 \ --make-pgen \ --chr 22 \ --thin-count 10 --seed 111 --threads 1 --memory 8000 require \ --out ./data/processed/all_hg38_qcd_LE1_chr22_thin10

--thin-count 10 --seed 111 --threads 1 --memory 8000 require
Randomly sample 10 SNPs. Use the seed, thread, memory parameters for reproducibilty with this tutorial.


time plink2 \ --pfile ./data/processed/all_hg38_qcd_LE1_chr22_thin10 \ --recover-var-ids ./data/raw/all_hg38.pvar.zst \ --make-pgen \ --out ./data/processed/all_hg38_qcd_LE1_chr22_thin10_rsids --recover-var-ids <reference file with the original variant IDs>
Main command to revert the variant IDs. The associated file has the original variant IDs that we want to recover (revert). In this case, we do not need to do anything special for the zstd compressed file.

--pfile <file that we want to change, recover IDs for>


You can pull up the pvar files for both located in ./data/processed/. Below is the last variant for both files; top is the altered (non - rs ID) and the bottom the reverted (rs ID). You can verify this using dbSNP by searching the rsID and confirming the location and alleles keeping in mind the build 38 version.

22      49499362        22:49499362C,G  C       G       AC=2367;AF=0.369613;CM=81.9027;AN=6404;AN_EAS=1170;AN_AMR=980;AN_EUR=1266;AN_AFR=1786;AN_SAS=1202;AN_EUR_unrel=1006;AN_EAS_unrel=1008;AN_AMR_unrel=694;AN_SAS_unrel=978;AN_AFR_unrel=1322;AF_EAS=0.266667;AF_AMR=0.264286;AF_EUR=0.339652;AF_AFR=0.596305;AF_SAS=0.250416;AF_EUR_unrel=0.345924;MAF_EUR_unrel=0.345924;AF_EAS_unrel=0.262897;MAF_EAS_unrel=0.262897;AF_AMR_unrel=0.257925;MAF_AMR_unrel=0.257925;AF_SAS_unrel=0.237219;MAF_SAS_unrel=0.237219;AF_AFR_unrel=0.586989;MAF_AFR_unrel=0.413011;AC_EAS=312;AC_AMR=259;AC_EUR=430;AC_AFR=1065;AC_SAS=301;AC_EUR_unrel=348;AC_EAS_unrel=265;AC_AMR_unrel=179;AC_SAS_unrel=232;AC_AFR_unrel=776;AC_Het_EAS=212;AC_Het_AMR=189;AC_Het_EUR=278;AC_Het_AFR=423;AC_Het_SAS=215;AC_Het_EUR_unrel=224;AC_Het_EAS_unrel=181;AC_Het_AMR_unrel=137;AC_Het_SAS_unrel=168;AC_Het_AFR_unrel=312;AC_Het=1317;AC_Hom_EAS=100;AC_Hom_AMR=70;AC_Hom_EUR=152;AC_Hom_AFR=642;AC_Hom_SAS=86;AC_Hom_EUR_unrel=124;AC_Hom_EAS_unrel=84;AC_Hom_AMR_unrel=42;AC_Hom_SAS_unrel=64;AC_Hom_AFR_unrel=464;AC_Hom=1050;HWE_EAS=0.0901093;ExcHet_EAS=0.970406;HWE_AMR=0.90745;ExcHet_AMR=0.628095;HWE_EUR=0.5956;ExcHet_EUR=0.738179;HWE_AFR=0.627022;ExcHet_AFR=0.715334;HWE_SAS=0.27659;ExcHet_SAS=0.899654;HWE=3.85487e-11;ExcHet=1

22      49499362        rs5770496       C       G       AC=2367;AF=0.369613;CM=81.9027;AN=6404;AN_EAS=1170;AN_AMR=980;AN_EUR=1266;AN_AFR=1786;AN_SAS=1202;AN_EUR_unrel=1006;AN_EAS_unrel=1008;AN_AMR_unrel=694;AN_SAS_unrel=978;AN_AFR_unrel=1322;AF_EAS=0.266667;AF_AMR=0.264286;AF_EUR=0.339652;AF_AFR=0.596305;AF_SAS=0.250416;AF_EUR_unrel=0.345924;MAF_EUR_unrel=0.345924;AF_EAS_unrel=0.262897;MAF_EAS_unrel=0.262897;AF_AMR_unrel=0.257925;MAF_AMR_unrel=0.257925;AF_SAS_unrel=0.237219;MAF_SAS_unrel=0.237219;AF_AFR_unrel=0.586989;MAF_AFR_unrel=0.413011;AC_EAS=312;AC_AMR=259;AC_EUR=430;AC_AFR=1065;AC_SAS=301;AC_EUR_unrel=348;AC_EAS_unrel=265;AC_AMR_unrel=179;AC_SAS_unrel=232;AC_AFR_unrel=776;AC_Het_EAS=212;AC_Het_AMR=189;AC_Het_EUR=278;AC_Het_AFR=423;AC_Het_SAS=215;AC_Het_EUR_unrel=224;AC_Het_EAS_unrel=181;AC_Het_AMR_unrel=137;AC_Het_SAS_unrel=168;AC_Het_AFR_unrel=312;AC_Het=1317;AC_Hom_EAS=100;AC_Hom_AMR=70;AC_Hom_EUR=152;AC_Hom_AFR=642;AC_Hom_SAS=86;AC_Hom_EUR_unrel=124;AC_Hom_EAS_unrel=84;AC_Hom_AMR_unrel=42;AC_Hom_SAS_unrel=64;AC_Hom_AFR_unrel=464;AC_Hom=1050;HWE_EAS=0.0901093;ExcHet_EAS=0.970406;HWE_AMR=0.90745;ExcHet_AMR=0.628095;HWE_EUR=0.5956;ExcHet_EUR=0.738179;HWE_AFR=0.627022;ExcHet_AFR=0.715334;HWE_SAS=0.27659;ExcHet_SAS=0.899654;HWE=3.85487e-11;ExcHet=1

Reference Alleles >>