# Assessing the Genetic Component of a Phenotype

A phenotype is the appearance of an individual, which results from the interaction of the person's genetic makeup and their environment. Phenotypes can be categorical or numerical. If we are interested in the genetic component of a trait there are different methods we can use for analysis.

We define a trait for analysis, determine what we are interested in studying, and then the methods that need to be used. For example, if we want to know how BMI is linked to gene effects we would need to consider other factors such as age, sex, smoking, etc. Thus, a [Multiple Linear Regression](https://bookstack.mitchellhenschel.com/books/multivariable-analysis/page/mutiple-linear-regression-and-estimation) might be appropriate, were we can measure the variability of each factor.

#### Variance of Phenotypic Traits

**𝜎<sup>2</sup><sub>T</sub>** = The observed (phenotypic) variability of a trait  
**𝜎<sup>2</sup><sub>T</sub> = 𝜎<sup>2</sup><sub>G</sub> + 𝜎<sup>2</sup>**<sub>**E**</sub> = The phenotypic variability can be partitioned in to variability due to **genetic** and **environmental** effects  
**𝜎<sup>2</sup><sub>G</sub> = 𝜎<sup>2</sup><sub>A</sub> + 𝜎<sup>2</sup>**<sub>**D**</sub> = The genetic component can be further partitioned into **additive** and **dominance** genetic variance

We can write a model for the trait as:   
**T = (A + D) + E = G + E**

Assumptions of this model:

- Genetic and environmental factors are uncorrelated
- Standard deviation of trait is the same for all individuals

#### Additive and Dominance Components

Consider a frequency distribution of trait values for two alleles B and b, where B creates a high trait value and b creates a low trait value on a continuous scale which is shifted so that the midpoint between the mean of BB (+a) and bb (-a) is 0:

[![image-1664374671516.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1664374671516.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1664374671516.png)

d is the mean of the Bb group

In an **additive** model d = 0 (no dominance; dominance variance = 0)  
In a **recessive model**: d = -a (Bb would overlap the bb distribution)  
In a **dominant model** d = +a (Bb would overlap the BB distribution)

<p class="callout success">The **degree of dominance** can be defined as d/a</p>

### Heritability

The **heritability** of a trait is the proportion of total phenotypic variance that is due to genetic effects.

Heritability can be defined as:

<p class="callout success">***h<sup>2</sup>* = 𝜎<sup>2</sup><sub>G</sub> / 𝜎<sup>2</sup><sub>T </sub>= ( 𝜎<sup>2</sup><sub>A</sub> + 𝜎<sup>2</sup><sub>D</sub> ) / ( 𝜎<sup>2</sup><sub>A</sub> + 𝜎<sup>2</sup><sub>D</sub> + 𝜎<sup>2</sup><sub>E</sub> )**</p>

The above formula is also called Broad sense hertiability. Narrow sense heritability (just the additive component):

*h<sub>**n**</sub><sup>2</sup>* **=**  **𝜎<sup>2</sup><sub>A</sub> / 𝜎<sup>2</sup><sub>T </sub>**

We use the expected resemblance among relatives to estimate h<sup>2</sup>. It is a function of covariance between relatives and coefficient of relationship (AKA "Additive coefficient")

**The additive coefficient of a relationship C is the expected proportion of alleles shared IBD by a relative pair**, defined as:

<p class="callout success">C = 2<sup>-R</sup>, where R is the degree of relationship</p>

- R = 0: MZ twins -&gt; C = 1
- R = 1: 1st degree relationship; sib, parent-offspring -&gt; C = 1/2
- R = 2: 2nd degree relatives: half-sibs, grandparent-grandchild, avuncular
- R = 3: 1st cousins

[![image-1664378503599.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1664378503599.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1664378503599.png)

Recall sharing a allele **Identical-By-Descent** means relatives who share the exact same copy of an allele by inheritance.

[![image-1664378809549.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1664378809549.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1664378809549.png)

The additive coefficient is expected proportion of alleles shared IBD by the pair, so we can also define it as

[![image-1664379231292.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1664379231292.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1664379231292.png)

p(x) = x/2 = the proportion of shared alleles

For example, the parent child relationship would have a additive coefficient of 0\*(0/2) + 1\*(1/2) + 0\*(2/2) = 1/2

The **kinship coefficient** is the probability that a randomly selected pair of alleles from a individual is IBD. It is always half of the coefficient of relationship.

### Estimating Heritability

#### Using Co-variance

Recall the properties of covariance:

[![image-1664379863104.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1664379863104.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1664379863104.png)

- Cov(X,Y) = Cov(Y,X)
- Cov(X, X) = var(X)
- Cov(X + Y, Z) = Cov(X, Z) + Cov(Y, Z)
- Cov(cX, Y) = c\*Cov(X, Y); where c is some constant
- The unit of covariance is xy
- Positive covariance: Value of x tends to be high when value of y is high
- Negative covariance: Value of x tends to be high when value of y is low

A standardized measure of covariance is correlation which is a scale of -1 to 1

[![image-1664380038270.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1664380038270.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1664380038270.png)

We can use the properties of covariance to determine the covariance between quantitative measure on a parent and an offspring. This folows the same T = (A + D) + E = G + E concept as above:

Cov(Parents, Offsrping) = Cov (A<sub>p</sub> + D<sub>p</sub> + E<sub>p</sub>, A<sub>o</sub> + D<sub>o</sub> + E<sub>o</sub>)

In this example, we can break this down into:

- Cov (A<sub>p</sub>, A<sub>o</sub>) = Additive covariance, offspring always inherits exactly 1 allele from parent so = (1/2)𝜎<sup>2</sup><sub>A</sub>
- Cov (E<sub>p</sub>, E<sub>o</sub>), covariance of environmental component between parent and offspring is assumed to be 0
- Cov (D<sub>p</sub>, D<sub>o</sub>) = Since offspring do not inherit pairs of alleles the dominance environmental component is also 0
- The cross terms, Cov(A, D) and etc. which we assume to be 0

Note: This only works if the assumptions of independence is true and standard deviation is the same for all individuals.

Since we are assuming standard deviation is the same across generations we can write hieritability between parent and offspring as a product of correlation:

***h<sub>n</sub><sup>2 </sup>*= 𝜎<sup>2</sup><sub>A</sub> / 𝜎<sup>2</sup><sub>T </sub>= 2 \* Cov(parent, offspring) / SD(parent)\*SD(child) = 2 \* Cor(parent, offspring)**

Thus, we can estimate narrow-sense heritability using 2 times the observed correlation between parents and offspring.

#### Using Linear Regression

Alternatively, we can use the linear regression coefficient to estimate heritability. Where Y is offspring and X is parent:

𝑌 = 𝛼 + 𝛽𝑋 + 𝐸

 𝛽\_hat ~ cor(X, Y) x SD(Y)/SD(X)

[![image-1664384066843.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1664384066843.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1664384066843.png)

***h<sub>n</sub><sup>2 </sup>*= 2\*r<sub>PO </sub>= 2\*𝛽\_hat<sub>PO</sub>**

Note that the estimate obtained from beta may differ from the estimate obtained from r if the standard deviations of parents and offspring are not equal.

If both parents are available we can regress the offspring value of the mean parental phenotype value:

[![image-1664384675608.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1664384675608.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1664384675608.png)

***h<sup>2 </sup>*= 𝛽\_hat<sub>MO </sub>**

Beta estimates heritability directly in the average parent version. I'm not going show the math here, but know this is in part due to SD(average parent) = SD(parent) / sqrt(2)

##### Example with Sibling Pairs

Looking at the relationship table above in the siblings row: additive coefficient is 1/2 and dominance coefficient 1/4. If data is available on N sibling pairs only:

Cov(Sibling 1. Sibling 2) = (1 / 2) 𝜎<sup>2</sup><sub>A</sub> + (1 / 4) 𝜎<sup>2</sup><sub>D</sub>

If 𝜎<sup>2</sup><sub>D</sub> = 0, 2 times the intraclass correlation (ICC) is an estimate of heritability:[![image-1664386082651.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1664386082651.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1664386082651.png)  
Where x and SD(x) are computed combining data on all siblings.

If 𝜎<sup>2</sup><sub>D</sub> does not equal 0, this estimate is between the narrow and broad heritability because:

[![image-1664386233273.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1664386233273.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1664386233273.png)

#### Intraclass Correlation Vs. Pearson (Product-Moment) Correlation

When pairs consist of individuals of two different classes (grandparent-grandchild, parent-offspring) we call this **pairwise correlation** and we can use a simple Pearson correlation coefficient:

[![image-1664386482464.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1664386482464.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1664386482464.png)

But when the pairs have no obvious order (siblings or cousins), the intraclass correlation is used:

[![image-1664386537312.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1664386537312.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1664386537312.png)  
where n is the number of pairs and x\_bar is the mean value across all individuals

The main difference between the product-moment correlation and ICC:

- ICC: SD in the denominator is pooled across individuals
- Pearson: SD for the two types of relatives are computed separately

ICC can be extended to sets of 3 or more. The formula for ICC is:

[![image-1664386880153.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1664386880153.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1664386880153.png)

k = number of siblings per sibship  
MSB = Mean Square Between (model)  
MSW = Mean Square Within (error)  
If the sibships are of unequal size using the average sibship size will provide an approximate h<sup>2</sup> estimate

<p class="callout success">h<sup>2</sup> = 2 \* ICC  
</p>

#### Twin Studies

Monozygotic twins are genetically identical and dizygotic twins are genetic the same as a full sibling, sharing half their genes on average. A common estimate of heritability in studies with both mono and dizygotic twins can estimate it as:

h<sup>2</sup> = 2 \* (ICC<sub>MZ </sub>- ICC<sub>DZ</sub>)

If 𝜎<sup>2</sup><sub>D</sub> does not equal 0, this estimate is greater than both the narrow and broad heritability because:

[![image-1664387386880.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1664387386880.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1664387386880.png)

This is in part because we cannot assume there is no shared environmental variance in siblings and especially twins.

#### Notes on Estimating Heritability

- We can obtain different estimates using different subsets of data
- Differences in estimates may be due to "sampling variation"
- If the dominance variance is not 0, some differences may be expected 
    - Siblings and twins incorporate dominance variance; parent-offspring do not
- Siblings and twins tend to share environment/exposure factors
- We may get smaller estimates with pairs that are more similar (Age, weight, etc)
- Precision of heritability estimates depends on the SE of the correlation coefficient or regression coefficient used for the estimation
- Large sample sizes are needed to get precises measures of heritability
- Heritability is a ratio of variances, not an average
- Law of large numbers does not necessarily apply here, heritability does not necessarily follow a normal distribution
- Maximum likelihood approach can be used to estimate  𝜎<sup>2</sup><sub>T ,</sub> 𝜎<sup>2</sup><sub>A</sub>, and 𝜎<sup>2</sup><sub>D </sub>-- and can also estimate shared environments when we are not assuming it to be 0

We can estimate the linear mixed effects (variance component model):

[![image-1664388948635.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1664388948635.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1664388948635.png)  
In which case we would estimate heritability as 𝜎<sup>2</sup><sub>G</sub> and test H0: 𝜎<sup>2</sup><sub>G</sub> = 0

### Heritability Summary

- Heritability is a population parameter, defined as the proportion of trait variance explained by additive genetic factors 
    - Estimates of h<sup>2</sup> are usually based on resemblance between relatives
    - Highly heritable -&gt; close relatives should be more correlated
- Heritability may differ by population 
    - h<sup>2</sup> depends on both 𝜎<sup>2</sup><sub>T </sub> and 𝜎<sup>2</sup><sub>A</sub>
- h<sup>2</sup> estimates using close and distant relatives should be similar 
    - Estimates bases on twins, sibs and cousin pairs estimate the same population parameter (unless there is dominance variance)
- If a trait is not heritable (h<sup>2</sup> = 0), we do not expect close relatives to be more correlated than distance relatives
- Heritability is generally computed for quantitative traits with analysis of variance, but can also be estimated for discrete traits 
    - Liability, or predisposition to disease, is measured on a quantitative scale
    - Threshold model: trait present if liability to disease exceeds a certain threshold
    - Heritability measures the liability between relatives
    - There is specialized software for calculating this[![image-1664472034875.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1664472034875.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1664472034875.png)
    - Referred to as the "latent variable" model, we can observe whether an individual has cross the threshold

### Recurrence Risk Ratio

**Recurrence risk ratio** assess familial aggregation of dichotomous traits (diseases). **𝜆<sub>𝑅 </sub>**denotes relative risk with a subscript indicating the specific type of relative examined.

A **proband** is a subject selected for a study due to disease status. In familial disease studies, it is the affected subject that brought the family into the study

<p class="callout success">**𝜆<sub>𝑅 </sub>= 𝐾<sub>𝑅</sub>/𝐾**  
𝐾<sub>𝑅</sub> = P(disease | relative of type R is affected) = the proportion (prevalence) of relatives of affected probands that are affect with disease  
K = the proportion of affected individuals in the general population  
</p>

General rule:

- If a trait is generic **𝜆<sub>𝑅 </sub>**should decrease as degree of relationship increases
- Differences in shared environment may lead to differences in **𝜆<sub>𝑅 </sub>**for different relative types of the same degree
- **𝜆<sub>𝑅 </sub>**may be greater than 1 when the trait does not have a genetic component due to a shared enviornment