# Bayesian Logistic Regression

Let Y be an indicator for presence of disease -&gt; Y | p ~ Bin(p, 1)

Logistic regression models the log-ODDS (also called logit) of the mean outcome using a linear predictor.  
[![image.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2023-01/scaled-1680-/2Rgimage.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2023-01/2Rgimage.png)

[![image.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2023-01/scaled-1680-/TCPimage.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2023-01/TCPimage.png)  
In a case-control study beta\_0 is not interpretable. But the estimate of log(OR) can be interpreted correctly, given the symmetry of roles of disease/exposure

H<sub>0</sub>: OR = 1; Log(OR) = 0; No association  
H<sub>1</sub>: OR != 1; Log(OR) != 0; Significant association

The Frequentist approach to test the above is to fit a logistic regression in R using the function glm() and assess the OR.

In Bayesian formulation we need to identify the parameters in our prior model, and the parameters we wish to estimate to form a posterior distribution to assess the OR. We can compute the probability that OR &gt; 1, &lt; 1 or != 1 using a small range.

Ex. Below we model 2 parameters, beta\_0 and beta\_1, in a DAG model:  
[![image.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2023-01/scaled-1680-/9smimage.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2023-01/9smimage.png)

Because there's no constraint on the beta regression parameters, it is common to assume a normal distribution.  
The magnitude of variance reflects the amount of uncertainty

In the below code:

- We loop to specify each observation
- Y.D\[i\] = 
    - 1 if subject i has the disease
    - 0 if subject i does not have the disease
- X.E\[i\] = 
    - 1 if subject i has been exposed
    - 0 if subject i has not been exposed

```R
# Analysis with JAGS
library(rjags)

model.1 <- "model{
### data model
for(i in 1:N){
    Y.D[i] ~ dbin(p[i], 1)
    logit(p[i]) <- beta_0 + beta_1*X.E[i]
}
OR <- exp(beta_1)
pos.prob <- step(beta_1)
### prior
beta_1 ~ dnorm(0,0.0001)
beta_0 ~ dnorm(0,0.0001)
 0 i th subject unexposed
}"
# In R Data stored as a list
data.1 <- list(N = 331, X.E = X.E, Y.D = Y.D)
# Compile model (data is part of it!); `adapt` for 2,000 samples
model_odds <- jags.model(textConnection(model.1), data = data.1,
n.adapt = 2000)
update(model_odds, n.iter = 5000) # 5,000 burn-in samples
# Get 10,000 samples from the posterior distribution of OR, beta_0,beta_1
test_odds <- coda.samples(model_odds, c("OR", "beta_1", "beta_0"), n.iter = 10000)
plot(test_odds)
summary(test_odds)
```

In the output, we'll often look for the point estimate to be the posterior mean. However, when posterior density is skewed, posterior median is sometimes preferable.

#### Asymptotic Approximation

When the sample size is large enough, one can use an asymptotic approximation of the posterior distribution of the parameters.

When the sampling distribution is a member of the exponential family (Normal, Binomial, Poisson, or Gamma), then the posterior distribution of the parameters is (approximately) normally distribution:  
[![image.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2023-01/scaled-1680-/r6simage.png)\[1\]](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2023-01/r6simage.png)  
beta\_hat is the ML (maximum likelihood) estimate of beta  
I(beta\_hat)^-1 is the Fisher Information matrix, evaluated in the ML estimate of beta

Based on the Maximum likelihood theory, asymptotically:  
[![image.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2023-01/scaled-1680-/tssimage.png)\[2\]](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2023-01/tssimage.png)

We don't need a prior in \[1\] because we have a lot of data. Results in \[1\] and \[2\] are close because the prior is weighted low because of the high sample size.

### Gibbs Sampling for Logistic Regression

[![image.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2023-01/scaled-1680-/woLimage.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2023-01/woLimage.png)

##### Posterior Inference

Obtain s + m sample for each parameter  
Throw away the first s values (burn-in)  
Use the m samples with Monte Carlo algorithms to carry out inference:

1. Point Estimates - Posterior means or medians  
    [![image.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2023-01/scaled-1680-/HLmimage.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2023-01/HLmimage.png)
2. Transformations - Any function of the parameters  
    [![image.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2023-01/scaled-1680-/TuPimage.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2023-01/TuPimage.png)
3. Posterior Density - Density estimation methods to estimate the posterior density of parameters
4. Tail probabilities  
    [![image.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2023-01/scaled-1680-/9rDimage.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2023-01/9rDimage.png)

#### Marginal and Conditional Independence  


Two random variables, X and Y, are independent (denoted by X ⊥ Y) if an only if:  
P(X, Y) = P(X)P(Y); Joint p.m.f. for joint density = Marginal p.m.f. or marginal density

Given three random variables, X, Y and X, are conditionally independent given Z (denoted by Y ⊥ Y | Z) if and only if:  
P(X, Y | Z) = P(X | Z) P(Y | Z)

There is no relation between marginal and conditional independence (Simpson Paradox)

The directed graph specifies conditional and marginal independence through two properties; The **Local Markov Property (LMP)** specifies that we need to specify only the distribution of nodes given their parents. The **Global Markov Property (GMP)** is used internally to derive the conditional distributions of each node given all others.

#### Local Markov Property (LMP)  


Descendants of Y \[D(Y)\] are all nodes reached from Y with a directed path  
Non-descendants of Y \[NS(Y)\] - all nodes minus the descendant nodes

[![image.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2023-01/scaled-1680-/IHMimage.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2023-01/IHMimage.png)

#### Global Markov Property (GMP)

The node is independent of everything else in the network given its Markov Blanket.

For each node, list the **Markov blanket:** Markov blanket of A: the parents of A, the children of A and the parents of the children of A

[![image.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2023-01/scaled-1680-/5iqimage.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2023-01/5iqimage.png)

Y\_1 independent of Y\_3, Y\_4, Y\_5, given Y\_2  
MB(Y\_1) = { Y\_2 }; MB(Y\_2) = { Y\_1, Y\_3, Y\_4 }

<p class="callout info">The LMP is used to specify the priors and the likelihood  
The GMP is used to generate the conditional distributions to run the Gibbs Sampling</p>

#### Bayesian Hypothesis Testing: Prior Odds

On each hypothesis we have a prior probability:  
P(H<sub>0</sub>) = P(M<sub>0</sub>) and P(H<sub>a</sub>) = P(M<sub>1</sub>)

Use the data to compute the posterior probability of each hypothesis:  
[![image.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2023-01/scaled-1680-/cd5image.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2023-01/cd5image.png)

This equates to:  
Posterior ODDs = Bayes Factor \* Prior Odds  
We accept the hypothesis with maximum posterior probability (0 - 1 loss)

The Bayes Factor is the ratio of the likelihood functions computed for models M0 and M1

If P(M0) = P(M1) then the posterior probability of M1 is larger than the posterior probability of M0

We can also assign weights to adjust the probability of making errors:  
[![image.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2023-02/scaled-1680-/image.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2023-02/image.png)  
Where alpha would be the loss incurred if M1 is accepted when M0 is true  
and beta is the loss incurred if M0 is accepted when M1 is true.