Haplotype pseudo alignment — phasing • ASEP

This function is used to obtain the pseudo haplotype phase of the RNA-seq data for a given gene, and align the major alleles across individuals.

phasing(dat, phased = FALSE, n_condition = "one")

Arguments

dat:	bulk RNA-seq dataset of a given gene. Must contain variables: One condition analysis: - `id`: character, individual identifier; - `ref`: numeric, the snp-level read counts for the reference allele if the haplotype phase of the data is unknown, and the snp-level read counts for allele aligned on paternal/maternal haplotype if haplotype phase is known; - `total`: numeric, snp-level total read counts for both alleles; Two conditions analysis: - `id`: character, individual identifier; - `snp`: character, the name/chromosome location of the heterzygous genetic variants; - `ref`: numeric, the snp-level read counts for the reference allele if the haplotype phase of the data is unknown, and the snp-level read counts for allele aligned on the same paternal/maternal haplotype for both conditions if haplotype phase is known; - `total`: numeric, snp-level total read counts for both alleles; - `group`: character, the condition each RNA-seq sample is obtained from (i.e., pre- vs post-treatment); - `ref_condition`: character, the condition used as the reference for pseudo haplotype phasing;
phased:	a logical value indicates whether the haplotype phase of the data is known or not. Default is FALSE
n_condition:	a character string indicates whether the RNA-seq data contains data from only one condition or two conditions (i.e. normal vs diseased). Possible values are "one" or "two". Default is "one"

dat:

bulk RNA-seq dataset of a given gene. Must contain variables:

One condition analysis:
- `id`: character, individual identifier;
- `ref`: numeric, the snp-level read counts for the reference allele if the haplotype phase of the data is unknown, and the snp-level read counts for allele aligned on paternal/maternal haplotype if haplotype phase is known;
- `total`: numeric, snp-level total read counts for both alleles;
Two conditions analysis:
- `id`: character, individual identifier;
- `snp`: character, the name/chromosome location of the heterzygous genetic variants;
- `ref`: numeric, the snp-level read counts for the reference allele if the haplotype phase of the data is unknown, and the snp-level read counts for allele aligned on the same paternal/maternal haplotype for both conditions if haplotype phase is known;
- `total`: numeric, snp-level total read counts for both alleles;
- `group`: character, the condition each RNA-seq sample is obtained from (i.e., pre- vs post-treatment);
- `ref_condition`: character, the condition used as the reference for pseudo haplotype phasing;

phased:

a logical value indicates whether the haplotype phase of the data is known or not. Default is FALSE

n_condition:

a character string indicates whether the RNA-seq data contains data from only one condition or two conditions (i.e. normal vs diseased). Possible values are "one" or "two". Default is "one"

Value

The psudo-phased RNA-seq data, with one more column "major" indicates the read counts for major alleles aligned across individuals