Module 5: Multivariate Normal Distribution

A variable X follows a discrete probability distribution if the possible values of X are either:

A finite set
A countable infinite sequence

p_x(x_i) = P(X=x_i) is called the probability mass function (PMF)

p_x(x_i) >= 0 as it is a probability
The sum of PMF for all values of X = 1

Recall that in a Discrete Probability Distribution :

In a Continuous Probability Distribution:

Because in a discrete set we are not concerned with the values in between our domain values.

Moment Generating Function

Moments are expected values of X, such as E(X), E(X²) = E(V), E(X³), etc. This, can also be calculated using the Moment Generating Function (MGF):

The rth moment of X, E(X^r) can be obtained by differentiating M_x(t) r times with respect to t and setting t=0

M_x(0) = 1
M^I_x(0) = E(X)
M^II_x(0) = E(X²) -> V(X) = M^II_x(0) - (MI_x(0))2
In general, M_x^(r)(0) = E(X^r)

In short, the nth moment is the nth derivative of MGF.

Uniqueness: if X and Y are two random variables and M_x(t) = M_y(t) when |t| < h for some positive number h, then X and Y have the same distribution

Note: MGF does not exist for all distributions (E(e^tx) may be infinity)

Important Distributions

Normal Distribution

X ~ N(μ, σ²) -infinity < μ < infinity , σ < 0

PDF:

E(X) = μ
V(X) = σ²
MGF:

Binomial Distribution

X ~ Binomial(n, p) 𝑝 ∈ [0, 1]

X = the number of successes in n trials when the probability of success in each trail is p.

We can think of X as the sum of n independent Bernoulli(p) random variables, with the same p for every X_i

PMF:

Expected value = E(X) = np
Variance = V(X) = np(1-p)
MGF = M_x(t) = (pe^t + (1-p))ⁿ
Two discrete random variables are independent if: P(X = x & Y = y) = P(X = x)*P(Y=y)

Ex. A study which analyzed the prevalence of a disease in a population.

Poisson Distribution

X ~ Poisson(λ) λ > 0

X = The number of occurrences of an event of interest.

PMF:

Expected Values = E(X) = λ
Variance = V(X) = λ
MGF = M_x(t) = e^λ(e^{^t - 1)}

Poisson as an approximation of the Binomial Distribution

If X ~ Binomial(n, p) and n -> infinity, p-> 0 such that np is a constant => X ~ Poisson(np)
This assumes each event is independent
Often used analyzing rare diseases

Ex. Analyzing lung cancer in 1000 smokers and non-smokers. This is binomial but can be estimated as a Poisson distribution.

Geometric Distribution

X ~ Geometric(p) 𝑝 ∈ (0, 1]

If Y₁, Y₂, Y₃ ... are a sequence of independent Bernoulli(p) random variables then the number of failures before the first success, X, follows a Geometric distribution.

PMF = P(X = x) = p(1-p)^x
Expected value = E(X) = (1-p)/p
Variance = V(X) = (1-p)/p²
MGF = M_x(t) = p / (1 - (1 - p)e^t)

Ex. We want to know the number of times to flip a coin before it lands on heads.

Hyper-Geometric Distribution

X ~ Hypergeometric(N, K, n)

Suppose a finite population of size N contains two mutually exclusive events: K success events and N-K failure events. If n events are randomly chosen without replacement X is the number of success events chosen.

PMF:

Expected value = E(X) = nk / N
Variance = V(X) = ((nK) / N) * ((N-K) / N) * ((N - n) / (N - 1))

Ex. A bag has 7 red beads and 13 white beads. If 5 are drawn without replacement what is the probability exactly 4 are red?

Uniform Distribution

All outcomes are equally likely, they can be discrete or continuous.

X ~ Uniform(a, b) a < b

PDF:

E(X) = (a + b)/2
V(X) = (b - a)² / 12
CDF = F(X) = (x - a) / (b -a), a<=x<=b
MGF:

We use this distribution we use when we have no idea how the data is distributed.

Log-Normal Distribution

X ~ Lognormal(μ , σ²) -infinity < μ < infinity, σ > 0

PDF:

E(X) = exp(μ + σ²/2)
Median = e^μ
V(X) = μ² * (e^{σ^2}-1)
log(X) ~ N(μ, σ²) - the log is normal
These distributions are often skewed to the right

Ex. Amount of rainfall, production of milk by cows, or stock market fluctuation often follow logarithmic functions.

Gamma Distribution

X ~ Gamma(α, λ) α > 0 , λ > 0

Used to predict the wait time until the first of event of something.

Alternate paramterization with α > 0, 𝜃 = 1 / λ > 0 is used by R:

E(X) = α / λ
V(X) = α / λ²
MGF:

Ex. Used to model time to failure or time to death.

Exponential Distribution

A special subset of the Gamma Distribution (α = 1)

X ~ Exponential(λ) λ > 0

PDF = f_x(x) = λ e^{-λ x} for x > 0
E(X) = 1 / λ
V(X) = 1 / λ ²
CDF = F_x(x) = 1 - e^{-λ x}
MGF = M_x(t) = λ / (λ - t), t < λ

Ex. The time between geyser eruptions.

Chi-Square Distribution

Special case of the Gamma Distribution (α = k/2, λ = 1/2)

X ~ X²(k) k is a positive integer (degrees of freedom, "df")

PDF:

E(X) = k
V(X) = 2k
MGF = (1 - 2t)^{-k / 2}, t < 1/2

If you took a sample of Z scores and squared them you would have a chi-squared distribution with k = 1. Meaning, if Z₁, Z₂... Z_m are independent standard normal random variables then:

Very few real world distributions follow a chi-sqaure distribution, it is mainly used in hypothisis testing.

Bivariate Normal Distribution

A bivariate normal distribution is made up of two independent random variables. The two variables are both normally distributed, and have a normal distribution when added together.