This function is used to obtain the pseudo haplotype phase of the RNA-seq data for a given gene, and align the major alleles across individuals.

phasing(dat, phased = FALSE, n_condition = "one")

Arguments

dat:

bulk RNA-seq dataset of a given gene. Must contain variables:

  • One condition analysis:
    - `id`: character, individual identifier;
    - `ref`: numeric, the snp-level read counts for the reference allele if the haplotype phase of the data is unknown, and the snp-level read counts for allele aligned on paternal/maternal haplotype if haplotype phase is known;
    - `total`: numeric, snp-level total read counts for both alleles;

  • Two conditions analysis:
    - `id`: character, individual identifier;
    - `snp`: character, the name/chromosome location of the heterzygous genetic variants;
    - `ref`: numeric, the snp-level read counts for the reference allele if the haplotype phase of the data is unknown, and the snp-level read counts for allele aligned on the same paternal/maternal haplotype for both conditions if haplotype phase is known;
    - `total`: numeric, snp-level total read counts for both alleles;
    - `group`: character, the condition each RNA-seq sample is obtained from (i.e., pre- vs post-treatment);
    - `ref_condition`: character, the condition used as the reference for pseudo haplotype phasing;

phased:

a logical value indicates whether the haplotype phase of the data is known or not. Default is FALSE

n_condition:

a character string indicates whether the RNA-seq data contains data from only one condition or two conditions (i.e. normal vs diseased). Possible values are "one" or "two". Default is "one"

Value

The psudo-phased RNA-seq data, with one more column "major" indicates the read counts for major alleles aligned across individuals