snipar.scripts.gwas module
Infers direct effects, non-transmitted coefficients (NTCs), and population effects of genome-wide SNPs on a phenotype.
Minimally: the script requires observed genotypes on phenotyped individuals along with a phenotype file. If no imputed parental genotypes are provided, a pedigree file is required, and the script will analyze samples with siblings and/or both parents genotyped by default.
- Args:
- ‘-h’, ‘–help’, default===SUPPRESS==
show this help message and exit
- : str
Location of the phenotype file
- ‘–bgen’str
Address of the phased genotypes in .bgen format. If there is a @ in the address, @ is replaced by the chromosome numbers in the range of chr_range for each chromosome (chr_range is an optional parameters for this script).
- ‘–bed’str
Address of the unphased genotypes in .bed format. If there is a @ in the address, @ is replaced by the chromosome numbers in the range of chr_range for each chromosome (chr_range is an optional parameters for this script).
- ‘–imp’str
Address of hdf5 files with imputed parental genotypes (without .hdf5 suffix). If there is a @ in the address, @ is replaced by the chromosome numbers in the range of chr_range (chr_range is an optional parameters for this script).
- ‘–pedigree’str
Address of pedigree file. Must be provided if not providing imputed parental genotypes.
- ‘–covar’str
Path to file with covariates: plain text file with columns FID, IID, covar1, covar2, ..
- ‘–chr_range’
Chromosomes to analyse. Should be a series of ranges with x-y format (e.g. 1-22) or integers.
- ‘–out’str, default=./
The summary statistics will output to this path, one file for each chromosome. If the path contains ‘@’, the ‘@’ will be replaced with the chromosome number. Otherwise, the summary statistics will be output to the given path with file names chr_1.sumstats.gz, chr_2.sumstats.gz, etc. for the text sumstats, and chr_1.sumstats.hdf5, etc. for the HDF5 sumstats
- ‘–grm’str
Path to GRM file giving pairwise relatednsss information. Designed to work with KING IBD segment inference output (.seg file).
- ‘–grmgz’str
Path to GRM in GCTA grm.gz format (without .grm.gz suffix). Assumes .grm.id file with same root path also available.
- ‘–sparse_thresh’float, default=0.05
Threshold of GRM sparsity — elements below this value are set to zero
- ‘–impute_unrel’
Whether to include unrelated individuals and impute their parental genotypes lineary or not. See Unified estimator in Guan et al.
- ‘–robust’
Use the robust estimator
- ‘–sib_diff’
Use the sibling difference method
- ‘–parsum’
Regress onto proband and sum of (imputed/observed) maternal and paternal genotypes. Default uses separate paternal and maternal genotypes when available.
- ‘–fit_sib’
Fit indirect effect from sibling
- ‘–phen’str
Name of the phenotype to be analysed — case sensitive. Default uses first phenotype in file.
- ‘–phen_index’int, default=1
If the phenotype file contains multiple phenotypes, which phenotype should be analysed (default 1, first)
- ‘–missing_char’str, default=NA
Missing value string in phenotype file (default NA)
- ‘–min_maf’float, default=0.01
Ignore SNPs with minor allele frequency below min_maf (default 0.01)
- ‘–max_missing’float, default=5
Ignore SNPs with greater percent missing calls than max_missing (default 5)
- ‘–vc_out’str
Prefix of output filename for variance component array (without .npy).
- ‘–vc_list’float
Pass in variance components as a list of floats.
- ‘–no_sib_var’
Do not fit sibling variance component. Not recommended for family-GWAS.
- ‘–keep’str
Filename of IDs to be kept for analysis (No header).
- ‘–cpus’int, default=1
Number of cpus to distribute batches across
- ‘–threads’int, default=1
Number of threads to use per CPU. Uses all available by default.
- ‘–no_hdf5_out’
Suppress HDF5 output of summary statistics
- ‘–batch_size’int, default=100000
Batch size of SNPs to load at a time (reduce to reduce memory requirements)
- Results:
- sumstats.gz
For each chromosome, a gzipped text file containing the SNP level summary statistics.