snipar.gtarray module

class snipar.gtarray.gtarray(garray, ids, sid=None, alleles=None, pos=None, chrom=None, map=None, error_probs=None, fams=None, par_status=None, ped=None)[source]

Bases: object

Define a genotype or PGS array that stores individual IDs, family IDs, and SNP information.

Args:

garrayarray: 2 or 3 dimensional numpy array of genotypes/PGS values. First dimension is individuals. For a 2 dimensional array, the second dimension is SNPs or PGS values. For a 3 dimensional array, the second dimension indexes the individual and his/her relatives’ genotypes (for example: proband, paternal, and maternal); and the third dimension is the SNPs.
idsarray: vector of individual IDs of same length as first dimension of garray
sidarray: vector of SNP ids, equal in length, L, to last dimension of array
allelesarray: [L x 2] matrix of ref and alt alleles for the SNPs. L must match size of sid
posarray: vector of SNP positions; must match size of sid
chromarray: vector of SNP chromosomes; must match size of sid
maparray: vector of SNP chromosomes; must match size of sid
famsarray: vector of family IDs; must match size of ids
par_status:class:`~numpy:numpy.array’: [N x 2] numpy matrix that records whether parents have observed or imputed genotypes/PGS, where N matches size of ids. The first column is for the father of that individual; the second column is for the mother of that individual. If the parent is neither observed nor imputed, the value is -1; if observed, 0; and if imputed, 1.

Returns:

G : snipar.gtarray

add(garray)[source]: Adds another gtarray of the same dimension to this array and returns the sum. It matches IDs before summing.

compute_freqs()[source]: Computes the frequencies of the SNPs. Stored in self.freqs.

compute_info()[source]

diagonalise(inv_root)[source]: This will transform the genotype array based on the inverse square root of the phenotypic covariance matrix from the family based linear mixed model.

fill_NAs()[source]: This normalises the SNP columns to have mean-zero, then fills in NA values with zero.

filter(filter_pass)[source]

filter_ids(keep_ids, verbose=False)[source]: Keep only individuals with ids given by keep_ids

filter_info(min_info=0.99, verbose=False)[source]

filter_maf(min_maf=0.01, verbose=False)[source]: Filter SNPs based on having minor allele frequency (MAF) greater than min_maf, and have % missing observations less than max_missing.

filter_missingness(max_missing=5, verbose=False)[source]

mean_normalise()[source]: This normalises the SNPs/PGS columns to have mean-zero.

scale()[source]: This normalises the SNPs/PGS columns to have variance 1.