snipar.gtarray module

class snipar.gtarray.gtarray(garray, ids, sid=None, alleles=None, pos=None, chrom=None, map=None, error_probs=None, fams=None, par_status=None, ped=None)[source]

Bases: object

Define a genotype or PGS array that stores individual IDs, family IDs, and SNP information.

Args:
garrayarray

2 or 3 dimensional numpy array of genotypes/PGS values. First dimension is individuals. For a 2 dimensional array, the second dimension is SNPs or PGS values. For a 3 dimensional array, the second dimension indexes the individual and his/her relatives’ genotypes (for example: proband, paternal, and maternal); and the third dimension is the SNPs.

idsarray

vector of individual IDs of same length as first dimension of garray

sidarray

vector of SNP ids, equal in length, L, to last dimension of array

allelesarray

[L x 2] matrix of ref and alt alleles for the SNPs. L must match size of sid

posarray

vector of SNP positions; must match size of sid

chromarray

vector of SNP chromosomes; must match size of sid

maparray

vector of SNP chromosomes; must match size of sid

famsarray

vector of family IDs; must match size of ids

par_status:class:`~numpy:numpy.array’

[N x 2] numpy matrix that records whether parents have observed or imputed genotypes/PGS, where N matches size of ids. The first column is for the father of that individual; the second column is for the mother of that individual. If the parent is neither observed nor imputed, the value is -1; if observed, 0; and if imputed, 1.

Returns:

G : snipar.gtarray

add(garray)[source]

Adds another gtarray of the same dimension to this array and returns the sum. It matches IDs before summing.

compute_freqs()[source]

Computes the frequencies of the SNPs. Stored in self.freqs.

compute_info()[source]
diagonalise(inv_root)[source]

This will transform the genotype array based on the inverse square root of the phenotypic covariance matrix from the family based linear mixed model.

fill_NAs()[source]

This normalises the SNP columns to have mean-zero, then fills in NA values with zero.

filter(filter_pass)[source]
filter_ids(keep_ids, verbose=False)[source]

Keep only individuals with ids given by keep_ids

filter_info(min_info=0.99, verbose=False)[source]
filter_maf(min_maf=0.01, verbose=False)[source]

Filter SNPs based on having minor allele frequency (MAF) greater than min_maf, and have % missing observations less than max_missing.

filter_missingness(max_missing=5, verbose=False)[source]
mean_normalise()[source]

This normalises the SNPs/PGS columns to have mean-zero.

scale()[source]

This normalises the SNPs/PGS columns to have variance 1.