Distributed computation
--parallel <1-based current job index> <total job pieces>
--parallel causes PLINK to complete only one part of a job; the job index is appended to the main output filename. (If the main output file is gzipped, the file extension will instead be of the form <usual extension before .gz>.<1-based index>.gz.)
Use Unix cat on the resulting files to assemble the full computation result. (For gzipped files, it is safe to do this either before or after decompression.) For example:
[chrchang:~/plink-ng]$ plink --bfile test_data --distance triangle bin --parallel 1 2 --out result
PLINK v1.90b6.9 64-bit (4 Mar 2019) www.cog-genomics.org/plink/1.9/
(C) 2005-2019 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to result.log.
Options in effect:
--bfile test_data
--distance triangle bin
--out result
--parallel 1 2
4096 MB RAM detected; reserving 2048 MB for main workspace.
100000 variants loaded from .bim file.
1000 people (1000 males, 0 females) loaded from .fam.
1000 phenotype values loaded from .fam.
Using up to 2 threads (change this with --threads).
Before main variant filters, 1000 founders and 0 nonfounders present.
Calculating allele frequencies... done.
100000 variants and 1000 people pass filters and QC.
Phenotype data is quantitative.
Distance matrix calculation complete.
IDs written to result.dist.id .
Distances (allele counts) written to result.dist.bin.1 .
[chrchang:~/plink-ng]$ plink --bfile test_data --distance triangle bin --parallel 2 2 --out result
PLINK v1.90b6.9 64-bit (4 Mar 2019) www.cog-genomics.org/plink/1.9/
(C) 2005-2019 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to result.log.
Options in effect:
--bfile test_data
--distance triangle bin
--out result
--parallel 2 2
100000 variants loaded from .bim file.
1000 people (1000 males, 0 females) loaded from .fam.
1000 phenotype values loaded from .fam.
Using up to 2 threads (change this with --threads).
Before main variant filters, 1000 founders and 0 nonfounders present.
Calculating allele frequencies... done.
100000 variants and 1000 people pass filters and QC.
Phenotype data is quantitative.
Distance matrix calculation complete.
Distances (allele counts) written to result.dist.bin.2 .
[chrchang:~/plink-ng]$ cat result.dist.bin.1 result.dist.bin.2 > result.dist.bin
[chrchang:~/plink-ng]$ plink --bfile test_data --read-dists result.dist.bin --regress-distance
PLINK v1.90b6.9 64-bit (4 Mar 2019) www.cog-genomics.org/plink/1.9/
(C) 2005-2019 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to plink.log.
Options in effect:
--bfile test_data
--read-dists result.dist.bin
--regress-distance
4096 MB RAM detected; reserving 2048 MB for main workspace.
100000 variants loaded from .bim file.
1000 people (1000 males, 0 females) loaded from .fam.
1000 phenotype values loaded from .fam.
Using up to 2 threads (change this with --threads).
Before main variant filters, 1000 founders and 0 nonfounders present.
Calculating allele frequencies... done.
100000 variants and 1000 people pass filters and QC.
Phenotype data is quantitative.
--read-dists: 499500 values loaded.
Phenotype stdev: 1.01927
Regression slope (y = genomic distance, x = avg phenotype): -17.9796
Regression slope (y = avg phenotype, x = genomic distance): -4.39263e-05
Setting d=63 for jackknife.
Jackknife s.e.: 7.03741
Jackknife s.e. (y = avg phenotype): 1.72336e-05
This sequence of commands writes the first half of the (triangular binary) distance matrix to result.dist.bin.1, the second half to result.dist.bin.2, assembles the full triangular binary matrix file with cat, and then loads the full matrix for analysis with --regress-distance.
Currently, the --r/--r2, --distance, --genome, --make-rel, --make-grm-gz/--make-grm-bin, --epistasis, and --fast-epistasis flags directly support distributed computation. For most matrix computations, either the 'square0' or 'triangle' output shape must be used.
--write-var-ranges <block ct>
Many simpler jobs can be distributed by providing an appropriate range to --snps on each machine. To facilitate this, --write-var-ranges divides the set of variants into equal-size blocks, and writes block boundaries to plink.var.ranges. (Sizes will vary by 1 if the total variant count is not divisible by the requested block count.)
Command-line help >>
|