snipar.scripts.gwas module

Infers direct effects, non-transmitted coefficients (NTCs), and population effects of genome-wide SNPs on a phenotype.

Minimally: the script requires observed genotypes on phenotyped individuals along with a phenotype file. If no imputed parental genotypes are provided, a pedigree file is required, and the script will analyze samples with siblings and/or both parents genotyped by default.

Args:
‘-h’, ‘–help’, default===SUPPRESS==

show this help message and exit

: str

Location of the phenotype file

‘–bgen’str

Address of the phased genotypes in .bgen format. If there is a @ in the address, @ is replaced by the chromosome numbers in the range of chr_range for each chromosome (chr_range is an optional parameters for this script).

‘–bed’str

Address of the unphased genotypes in .bed format. If there is a @ in the address, @ is replaced by the chromosome numbers in the range of chr_range for each chromosome (chr_range is an optional parameters for this script).

‘–imp’str

Address of hdf5 files with imputed parental genotypes (without .hdf5 suffix). If there is a @ in the address, @ is replaced by the chromosome numbers in the range of chr_range (chr_range is an optional parameters for this script).

‘–pedigree’str

Address of pedigree file. Must be provided if not providing imputed parental genotypes.

‘–covar’str

Path to file with covariates: plain text file with columns FID, IID, covar1, covar2, ..

‘–chr_range’

Chromosomes to analyse. Should be a series of ranges with x-y format (e.g. 1-22) or integers.

‘–out’str, default=./

The summary statistics will output to this path, one file for each chromosome. If the path contains ‘@’, the ‘@’ will be replaced with the chromosome number. Otherwise, the summary statistics will be output to the given path with file names chr_1.sumstats.gz, chr_2.sumstats.gz, etc. for the text sumstats, and chr_1.sumstats.hdf5, etc. for the HDF5 sumstats

‘–grm’str

Path to GRM file giving pairwise relatednsss information. Designed to work with KING IBD segment inference output (.seg file).

‘–grmgz’str

Path to GRM in GCTA grm.gz format (without .grm.gz suffix). Assumes .grm.id file with same root path also available.

‘–sparse_thresh’float, default=0.05

Threshold of GRM sparsity — elements below this value are set to zero

‘–impute_unrel’

Whether to include unrelated individuals and impute their parental genotypes lineary or not. See Unified estimator in Guan et al.

‘–robust’

Use the robust estimator

‘–sib_diff’

Use the sibling difference method

‘–parsum’

Regress onto proband and sum of (imputed/observed) maternal and paternal genotypes. Default uses separate paternal and maternal genotypes when available.

‘–fit_sib’

Fit indirect effect from sibling

‘–phen’str

Name of the phenotype to be analysed — case sensitive. Default uses first phenotype in file.

‘–phen_index’int, default=1

If the phenotype file contains multiple phenotypes, which phenotype should be analysed (default 1, first)

‘–missing_char’str, default=NA

Missing value string in phenotype file (default NA)

‘–min_maf’float, default=0.01

Ignore SNPs with minor allele frequency below min_maf (default 0.01)

‘–max_missing’float, default=5

Ignore SNPs with greater percent missing calls than max_missing (default 5)

‘–vc_out’str

Prefix of output filename for variance component array (without .npy).

‘–vc_list’float

Pass in variance components as a list of floats.

‘–no_sib_var’

Do not fit sibling variance component. Not recommended for family-GWAS.

‘–keep’str

Filename of IDs to be kept for analysis (No header).

‘–cpus’int, default=1

Number of cpus to distribute batches across

‘–threads’int, default=1

Number of threads to use per CPU. Uses all available by default.

‘–no_hdf5_out’

Suppress HDF5 output of summary statistics

‘–batch_size’int, default=100000

Batch size of SNPs to load at a time (reduce to reduce memory requirements)

Results:
sumstats.gz

For each chromosome, a gzipped text file containing the SNP level summary statistics.

snipar.scripts.gwas.main(args)[source]

Calling this function with args is equivalent to running this script from commandline with the same arguments. Args:

args: list

list of all the desired options and arguments. The possible values are all the values you can pass this script from commandline.