snipar.imputation.impute_from_sibs module

Contains functions in cython for doing the parent sum imputation from offsprings and parents(if they are observed).


get_probability_of_both_parents_conditioned_on_offsprings get_probability_of_one_parent_conditioned_on_offsprings_and_parent get_IBD get_hap_index is_possible_child dict_to_cmap impute_snp_from_offsprings impute_snp_from_parent_offsprings get_IBD_type impute


Does the parent sum imputation for families in sibships and all the SNPs in unphased_gts and returns the results.

Inputs and outputs of this function are ascii bytes instead of strings. It writes result of the imputation to the output_address.


A pandas DataFrame with columns [‘FID’, ‘FATHER_ID’, ‘MOTHER_ID’, ‘IID’] where IID columns is a list of the IIDs of individuals in that family. It only contains families with more than one child. The parental sum is computed for all these families.


A dictionary mapping IIDs of people to their location in the bed file.

phased_gtsnumpy.array[signed char]

Numpy array containing the phased genotype data. Axes are individulas and SNPS respectively. It’s elements should be 0 or 1 except NaN values which should be equal to nan_integer specified in the config.

unphased_gtsnumpy.array[signed char]

Numpy array containing the unphased genotype data from a bed file. Axes are individulas, SNPS and haplotype number respectively. It’s elements should be 0 or 1 except NaN values which should be equal to nan_integer specified in the config.


A pandas DataFrame with columns “ID1”, “ID2”, ‘segment’. The segments column is a list of IBD segments between ID1 and ID2. Each segment consists of a start, an end, and an IBD status. The segment list is flattened meaning it’s like [start0, end0, ibd_status0, start1, end1, ibd_status1, …]


A numpy array with the position of each SNP in the order of appearance in phased and unphased gts.

Other key values to be added to the HDF5 output. Usually contains:

‘bim_columns’ : Columns of the resulting bim file ‘bim_values’ : Contents of the resulting bim file ‘pedigree’ : pedigree table Its columns are has_father, has_mother, single_parent respectively. ‘non_duplicates’ : Indexes of the unique snps. Imputation is restricted to them. ‘standard_f’ : Whether the allele frequencies are just population average instead of MAFs estimated using PCs ‘MAF_*’ : info about the MAF estimator if MAF estimator is used.

chromosome: str

Name of the chromosome(s) that’s going to be imputed. Only used for logging purposes.

freqs: list[float]

A two-dimensional array containing estimated fs for all individuals and SNPs respectively.

output_addressstr, optional

If presented, the results would be written to this address in HDF5 format. Aside from all the key, value pairs inside hdf5_output_dict, the following are also written to the file.

‘imputed_par_gts’ : imputed parental genotypes. It’s the imputed missing parent if only one parent is missing and the imputed average of the both parents if both are missing. ‘pos’ : the position of SNPs(in the order of appearance in genotypes) ‘families’ : family ids of the imputed parents(in the order of appearance in genotypes) ‘parental_status’ : a numpy array where each row shows the family status of the family of the corresponding row in families. Columns are has_father, has_mother and, single_parent. ‘sib_ratio_backup’ : An array with the size of number of snps. Show the ratio of backup imputation among offspring imputations in each snp. ‘parent_ratio_backup’ : An array with the size of number of snps. Show the ratio of backup imputation among parent-offspring imputations in each snp. ‘mendelian_error_ratio’ : Ratio of mendelian errors among parent-offspring pairs for each snp ‘estimated_genotyping_error’ : estimated for each snp using mendelian_error_ratio and maf ‘ratio_ibd0’ : ratio of families with offsprings in ibd0 to all the fams.

threadsint, optional

Specifies the Number of threads to be used. If None there will be only one thread.


Optional compression algorithm used in writing the output as an hdf5 file. It can be either gzip or lzf. None means no compression.


Additional settings for the optional compression algorithm. Take a look at the create_dataset function of h5py library for more information. None means no compression setting.

half_windowint, optional

For each location i, the IBD inference for the haplotypes is restricted to [i-half_window, i+half_window].

ibd_thresholdfloat, optional

Minimum ratio of agreement between haplotypes for declaring IBD.

silent_progressboolean, optional

Whether it should log the percentage of imputation’s progress

use_backupboolean, optional

Whether it should use backup imputation where there is no ibd infomation available. It’s false by default.

tuple(list, numpy.array)

The second element is imputed parental genotypes and the first element is family ids of the imputed parents(in the order of appearance in the first element).