# Module 4: Discrete Distributions

For any domain there are infinitely many distributions. The most common and famous distributions get a name; Binomial, Negative Binomial, Geometric, Hypergeometric, Poisson, etc. In this section we focus on Binomial and Poisson distributions.

The **Bernoulli** **Distribution** is a special member of the distribution family. It is the simplest example of a **Binomial distribution,** with only two domains (aka **dicothomous** distribution). A experiment which only has two domains is called a Bernoulli experiment. Ex. the number of students who get an A on a test, whether a person has a disease or not.

If we have two Bernoulli independent trials with equal probability of a positive result, **we refer to that probability as pi (not 3.14)**

X<sub>1</sub> = { 1 if outcome +, 0 if outcome - } and X<sub>2 </sub>= { 1 if outcome +, 0 if outcome - }

Then, X = X<sub>1 </sub>+ X<sub>2</sub>

The variable X above is a random variable with domain of {0, 1, 2} as it is a result of the two trials. The distribution is an example of a Binomial (2, pi) distribution.

More generally, if X<sub>i </sub>are n Bernoulli independent trials with probability of a positive result equals pi

[![image-1660834298993.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-08/scaled-1680-/image-1660834298993.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-08/image-1660834298993.png)

The domain of X a Binomial (n, pi) is {0, 1, 2... n}. When n=1 the binomial reduces to Bernoulli

For k in domain {0, 1, 2, ...n}:

[![image-1660840678211.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-08/scaled-1680-/image-1660840678211.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-08/image-1660840678211.png)

This is only for = and not eqaullity

- Where (<sup>n</sup><sub>k</sub>) = (n!) / (k! \* (n-k)!) Where n! = 1 \* 2 \* 3 \* ... n and 0! = 1
- Mean = μ = E \[X \] = nπ
- Variance = σ<sup>2</sup> = Var \[X\] = nπ(1 − π)

Note that variance is a function of mean, Mean &gt; Variance and for a fixed n the variance is maximum at pi = .5

We can construct the standard Z score with:

[![image-1660835389912.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-08/scaled-1680-/image-1660835389912.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-08/image-1660835389912.png)

We can use the Standard Normal Distribution to approximate a binomial distribution when n is large (say &gt; 25), this is an example of the Central Limit Theorem. The **Central Limit Theorem** states if you take the sum of a large number of independent, identically distributed variables you can approximate the outcome under a normal distribution. This is the basis of inference in current applied statistics.

### Poisson Distribution  


Named after the French mathematician who derived it; the first application was the description of the number of deaths as a result of horse kicking in the Prussian army. It can be used to model the number of events occurring within a given time interval. The probability density (mass) function is:

[![image-1660837060834.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-08/scaled-1680-/image-1660837060834.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-08/image-1660837060834.png)

where λ is the mean of the distribution (mean number of events); λ determines the shape of the distribution. Other properties which make Poisson distribution popular:

- The mean and variance are both equal to λ
- The sum of independent Poisson variables is also (!) Poisson variable with mean equal to sum of the individual means
- Poisson distribution provides an approximation for the Binomial distribution
- The standard Z score:

[![image-1660837263357.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-08/scaled-1680-/image-1660837263357.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-08/image-1660837263357.png)

If n is large and pi is small, the the Binomial distribution with parameters n and pi can be approximated by a Poisson distribution with mean parameter n\*(pi). From there probability calculations with Poisson reduce to probability calculations for a standard normal distribution.

When converting a discrete binomial distribution to a continuous distribution we must add correction for continuous conversions:

[![image-1660847869791.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-08/scaled-1680-/image-1660847869791.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-08/image-1660847869791.png)

Think about it like this:

[![image-1660870087072.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-08/scaled-1680-/image-1660870087072.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-08/image-1660870087072.png)

In a binomial distribution probability can only accumulate at discrete times {1, 2, 3,...} but since a normal distribution is continuous, you have to account for weather or not you want to include the point.

Useful R functions:

[![image-1660852336215.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-08/scaled-1680-/image-1660852336215.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-08/image-1660852336215.png)

The general rule with functions in R:

- A single point (Probability Density Function) starts with d
- Cumulative Distribution starts with p
- Quantile starts with q
- Random variables start with r