Count data is commonly modelled using a generalised linear model with a Poisson likelihood, and this approach extends trivially to cases with multiple count data random variables, ZiZ0+,i=1NZ_i\in\mathbb{Z}^{0+}, i=1\ldots N which are independent of one another, ZiZj,i,jZ_i\perp Z_j, \forall i,j because the likelihoods factor out nicely. But what about the case where there is some correlation?

In this short post I’ll describe the case in which we have two correlated count variables and we believe that a linear function of some known covariates can explain variation in the data.

Univariate Poisson regression and the MLE fit

The poisson distribution is specified by a parameter, λiR\lambda_i\in\mathbb{R}, which turns out to be both its mean and variance. It’s PMF for an observation, ziz_i, is given by

Po(ziλi)=λizieλizi!. \text{Po}(z_i | \lambda_i) = \frac{\lambda_i^{z_i}e^{-\lambda_i}}{z_i!}.

Assuming the data is IID the log-likelihood is given by:

L(λizi)=inzilog(λi)λilog(zi!). \mathcal{L}(\lambda_i | z_i) = \sum_i^n z_i\log(\lambda_i) - \lambda_i - \log(z_i!).

The last piece we need, a ’link function’: There are a couple of options we can use to relate the Poisson parameter to our model parameters, βj\beta_j, and the observed covariates, zjz_j, but the common choice is the log-link:

logλi=βxi \log\lambda_i = \vec{\beta}\cdot \vec{x_i}

Typically we would take derivatives of the log-likelihood with respect to the wjw_j using the chain rule and use the iteratively reweighted least squares algorithm (see page 294 of [1] for details on that approach) but here we can be lazy and let scipy’s brentq minimisation do the hard work for us (this will require more function evaluations than if we provided the derivative of the negative log-likelihood and prefered L-BFGS over brentq, but everything here runs in O(seconds) on my laptop anyway)

We’re nearly ready to start fitting some models, but first some housekeeping to make our life easier later:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
from dataclasses import dataclass
from typing import TypeVar, Generic, Tuple, Union, Optional
import numpy as np
import pandas as pd
from scipy import optimize

@dataclass
class Dims:
    N: int
    M: int

@dataclass
class ObservedData:
    covariates: np.array
    Z0: np.array
    Z1: np.array = None

@dataclass
class PoissonData:
    dims: Dims
    observed: ObservedData

We can generate some fake data from a true model lambdas by generating some random covariates and using some true model β\beta’s, and then sampling from the resulting Poisson distributions:

1
2
3
4
5
6
7
8
def generate_poisson_data(true_betas: np.array, N: int, sigma: float=1) -> PoissonData:
    M = true_betas.shape[0]
    covariates = np.random.normal(0, sigma, (N, M))
    # Fix the first row to allow for a constant term in the regression
    covariates[0, :] = 1
    lambdas = np.exp(covariates.dot(true_betas))
    poisson_draws = np.random.poisson(lambdas)
    return PoissonData(Dims(N, M), ObservedData(covariates, poisson_draws))

and, as above, in order to fit the model we need an expression for the negative (so make the problem is a minimisation) log-likelihood and for neatness, and further use, we’ll want a little wrapper around scipy’s minimise function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def poisson_nll(betas, covariates, targets):
    loglambda = covariates.dot(betas)
    lambda_ = np.exp(loglambda)
    return -np.sum(targets * loglambda - lambda_)

def poisson_regression(covariates, targets):
    betas = np.ones(covariates.shape[1])
    r = optimize.minimize(poisson_nll, betas, args=(covariates, targets), options={'gtol': 1e-2})
    assert r.success, f"Poisson Regression failed with status: {r.status} ({r.message})"
    return r.x

and with these we can generate, and fit, some Poisson data from the following model:

p(zixi,β)Po(λi)log(λi)=1+2x1x2\begin{align} p(z_i | \vec{x}_i, \vec{\beta}) &\sim \text{Po}(\lambda_i) \nonumber \\ \log(\lambda_i) &= 1 + 2x_1 - x_2 \nonumber \end{align}

1
2
3
4
true_betas = np.array([1, 2, -1])
dataset = generate_poisson_data(true_betas, 1000)
poisson_regression(dataset.observed.covariates, dataset.observed.Z0)
>>> array([ 0.99064907,  2.00878208, -0.98579406])

Of course the usual caveat applies – if you’re trying to build a univariate Poisson regression in python, don’t use my code - do it using statsmodels:

1
2
3
4
5
6
7
import statsmodels.api as sm
fit = sm.GLM(
    dataset.observed.Z0,
    dataset.observed.covariates,
    family=sm.families.Poisson()
).fit()
print(fit.summary())

Dataset

We can see that our fit is in good agreement with statsmodels, but we also get lots of other interesting information (coefficient uncertainty, NLL, deviance, etc.).

Correlations between two Poisson random variables

As mentioned in the introduction, if we have two Poisson random variables, Z1Z_1 and Z2Z_2, with no correlation (known as the ‘Double Poisson’ model) then the above approach still holds since the likelihood factorises:

Po(z1i,z2iλ1i,λ1i)=λ1iz1ieλ1iz1i!λ2iz2ieλ2iz2i!.\begin{align} \text{Po}(z_{1i}, z_{2i} | \lambda_{1i}, \lambda_{1i}) = \frac{\lambda_{1i}^{z_{1i}}e^{-\lambda_{1i}}}{z_{1i}!}\cdot\frac{\lambda_{2i}^{z_{2i}}e^{-\lambda_{2i}}}{z_{2i}!}. \nonumber \end{align}

L(λ1i,λ2iz1i,z2i)=in(z1ilog(λ1i)λ1ilog(z1i!))+in(z2ilog(λ2i)λ2ilog(z2i!))\begin{align} \Rightarrow \mathcal{L}(\lambda_{1i}, \lambda_{2i} | z_{1i}, z_{2i}) &= \sum_i^n \big(z_{1i}\log(\lambda_{1i}) - \lambda_{1i} - \log(z_{1i}!)\big) + \nonumber \\ &\quad \sum_i^n \big(z_{2i}\log(\lambda_{2i}) - \lambda_{2i} - \log(z_{2i}!)\big) \nonumber \end{align}

and to maximise this we can just maximise the two sums separately. When doing so all of derivate crossterms, βnλm\partial _ {\vec{\beta} _ n} \vec{\lambda} _ m, are zero for nmn \neq m because of our ’no correlation’ stipulation.

But what if we suspect that our two random variables are correlated?

The Bivariate Poisson model deals with exactly this case – let Y0,Y1,Y2Y_0, Y_1, Y_2 be independent univariate Poisson distributed with parameters λ0,λ1,λ2\lambda_0, \lambda_1, \lambda_2 respectively. Then

Z0=Y0+Y2Z1=Y1+Y2\begin{align} Z_0 &= Y_0 + Y_2 \nonumber \\ Z_1 &= Y_1 + Y_2 \nonumber \end{align}

are bivariate poisson distributed. Clearly Z0Z_0 and Z1Z_1 are maginally Poisson and therefore we have:

E[Z0]=λ0+λ2E[Z1]=λ1+λ2\begin{align} \mathbb{E}[Z_0] &= \lambda_0 + \lambda_2 \nonumber \\ \mathbb{E}[Z_1] &= \lambda_1 + \lambda_2 \nonumber \end{align}

since sums of two independent Poisson distribution results in another Poisson parametrised by the sum of the two input distribution parameters. But more interestingly we also have

Cov(Z0,Z1)=Cov(Y0+Y2,Y1+Y2)=Cov(Y0,Y1)+Cov(Y0,Y2)+Cov(Y2,Y1)+Cov(Y2,Y2)=Cov(Y2,Y2)=Var(Y2)=λ2\begin{align} \text{Cov}(Z_0, Z_1) &= \text{Cov}(Y_0 + Y_2, Y_1 + Y_2) \nonumber \\ &= \text{Cov}(Y_0, Y_1) + \text{Cov}(Y_0, Y_2) + \text{Cov}(Y_2, Y_1) + \text{Cov}(Y_2, Y_2) \nonumber \\ &= \text{Cov}(Y_2, Y_2) \nonumber \\ &= \text{Var}(Y_2) \nonumber \\ &= \lambda_2 \nonumber \end{align}

where Cov(Yi,Yj)=0\text{Cov}(Y_i, Y_j) = 0 for iji\neq j by constrution of the YjY_j’s.

The PMF of the Bivariate Poisson model is given by:

fBP(Z0,Z1λ0,λ1,λ2)=eλ0λ1λ2λ0z0z0!λ1z1z1!k=0min(z0,z1)(z0k)(z1k)k!(λ2λ1λ1)k f_\text{BP}(Z_0, Z_1 | \lambda_0, \lambda_1, \lambda_2) = e^{-\lambda_0-\lambda_1-\lambda_2}\frac{\lambda_0^{z_0}}{z_0!}\frac{\lambda_1^{z_1}}{z_1!}\sum_{k=0}^{\min({z_0, z_1})} \binom{z_0}{k}\binom{z_1}{k}k!\left(\frac{\lambda_2}{\lambda_1\lambda_1}\right)^k

We can see that when λ20\lambda_2\rightarrow0 this reduces to the Double Poisson model since only the k=0k=0 term in the sum contributes and it contributes 11, leaving only the coefficient in front remaining.

The problem we now have to solve then is:

Z0i,Z1iBP(Z0i,Z1iλ0i,λ1i,λ2i)logλki=βkxi\begin{align} Z_{0i}, Z_{1i} &\sim \text{BP}(Z_{0i}, Z_{1i} | \lambda_{0i}, \lambda_{1i}, \lambda_{2i}) \nonumber \\ \log\lambda_{ki} &= \vec{\beta_k} \cdot \vec{x_{i}} \nonumber \end{align}

where ii indexes over the paired datapoints and k=13k=1\ldots3, and associated covariates. It is possible that the λk\lambda_k depend on different (or even distinct) subsets of the covariates, but we can always treat this case using a single xi\vec{x_{i}} created by concatenating the different features and enforcing that the pertinent components of β\beta are zero.

Fitting the Bivariate Poisson with the EM algorithm

We saw above that we can directly maximise the likelihood of a univariate Poisson distribution, but with the Bivariate Poisson case it is more awkward. The unobserved nature of Y2Y_2 means that there is not a neat closed form expression for MLE.

Instead we can follow the approach taken in the Karlis and Ntzoufras paper [2] and take an Expectation-Maximisation approach [3] [4]. I also recommend Ntzoufras’s book which describes lots of interesting models [5].

Computing The Likelihood

We can write down a seemingly useless form for the bivariate model, one expressed in terms of all three of the unobserved Poisson components:

f(Y0,Y1,Y2Θ)=i=1nk=13eλiλiykyk! f(Y_0, Y_1, Y_2 | \Theta) = \prod_{i=1}^n \prod_{k=1}^3 \frac{e^{-\lambda_i} \lambda_i^{y_k}}{y_k!}

We can perform a change of variables so that our model is in terms of the two observed random variables (Z0,Z1)(Z_0, Z_1) and a single latent variable Y2Y_2, the determinant of the Jacobian of the transformation is 1 so we have

L(λ0,λ1,λ2Z0,Z1,Y2)=i=1neλ0λ0z0iy2i(z0iy2i)!eλ1λ1z1iy2i(z1iy2i)!eλ2λ2y2iy2i!\begin{align} L(\lambda_0, \lambda_1, \lambda_2 | Z_0, Z_1, Y_2) &= \prod_{i=1}^n \frac{e^{-\lambda_0} \lambda_0^{z_{0i} - y_{2i}}}{(z_{0i} - y_{2i})!} \frac{e^{-\lambda_1} \lambda_1^{z_{1i} - y_{2i}}}{(z_{1i} - y_{2i})!} \frac{e^{-\lambda_2} \lambda_2^{y_{2i}}}{y_{2i}!} \nonumber \end{align}

and therefore the log-likelihood is

L(λ0,λ1,λ2Z0,Z1,Y2)=nλ0+i=1n(z0iy2i)logλ0logi=1n(z0iy2i)!nλ1+i=1n(z1iy2i)logλ1logi=1n(z1iy2i)!nλ2+i=1ny2ilogλ2logi=1ny2i!\begin{align} \mathcal{L}(\lambda_0, \lambda_1, \lambda_2 | Z_0, Z_1, Y_2) &= -n\lambda_0 + \sum_{i=1}^n(z_{0i} - y_{2i})\log\lambda_0 - \log\prod_{i=1}^n(z_{0i} - y_{2i})! \nonumber \\ &\quad -n\lambda_1 + \sum_{i=1}^n(z_{1i} - y_{2i})\log\lambda_1 - \log\prod_{i=1}^n(z_{1i} - y_{2i})! \nonumber \\ &\quad -n\lambda_2 + \sum_{i=1}^n y_{2i}\log\lambda_2 - \log\prod_{i=1}^ny_{2i}! \nonumber \end{align}

This is almost the same as the ‘Double Poisson’ model and we know we can maximise it with respect to the λ\lambda’s, except that we don’t know the value of the y2iy_{2i}. This is where the Expectation-Maximisation algorithm comes in.

Computing The Density of the Latent Variable

To get the Q\mathcal{Q} function we need to compute the expectation of the log-likelihood with respect to the density f(Y2Z0,Z1,λk(t))f(Y_2|Z_0, Z_1, \lambda_k^{(t)}), where λk(t)\lambda_k^{(t)} are our current guess for the lambdas. Using Baye’s theorem we can write this as:

f(Y2Z0,Z1,λk(t))=f(Z0,Z1Y2,λk(t))f(Y2λk(t))f(Z0,Z1λk(t)) f(Y_2|Z_0, Z_1, \lambda_k^{(t)}) = \frac{f(Z_0, Z_1 | Y_2, \lambda_k^{(t)})f(Y_2 | \lambda_k^{(t)})}{f(Z_0, Z_1 | \lambda_k^{(t)})}

The denominator is just the Bivariate Poisson density and f(Y2λk(t))f(Y_2 | \lambda_k^{(t)}) is just the Poisson density. The last term, f(Z0,Z1Y2,λk(t))f(Z_0, Z_1 | Y_2, \lambda_k^{(t)}), is a little harder to reason about – but if we look to the definitions Z0,Z1Z_0, Z_1 and imagine that Y2Y_2 is just a constant we see the following is true

f(Z0,Z1Y2,λk(t))=f(Y0=Z0Y2λk(t))f(Y1=Z1Y2λk(t)) f(Z_0, Z_1 | Y_2, \lambda_k^{(t)}) = f(Y_0 = Z_0 - Y_2 | \lambda_k^{(t)})f(Y_1 = Z_1 - Y_2 | \lambda_k^{(t)})

and both of the distributions on the right-hand side are just more Poisson distributions. Thus:

f(Y2Z0,Z1,λk(t))=Po(Y0=Z0Y2λk(t))Po(Y1=Z1Y2λk(t))Po(Y2λk(t))fBP(Z0,Z1λk(t)) f(Y_2|Z_0, Z_1, \lambda_k^{(t)}) = \frac{\text{Po}(Y_0 = Z_0 - Y_2 | \lambda_k^{(t)})\text{Po}(Y_1 = Z_1 - Y_2 | \lambda_k^{(t)})\text{Po}(Y_2 | \lambda_k^{(t)})}{f_\text{BP}(Z_0, Z_1 | \lambda_k^{(t)})}

Computing The Expectation of the Latent Variable

With this density computed we can compute the expectation of the unobserved Y2Y_2 conditioned on the observed data:

EY2Z0,Z1,λk(t)[Y2]=j=0jPo(z0jλ0)Po(z1jλ1)Po(jλ2)fBP(Z0,Z1λk(t)) \mathbb{E} _ {Y_2|Z_0, Z_1, \lambda _ k^{(t)}}[Y _ 2] = \sum_{j=0}^\infty j\frac{\text{Po}(z_0 - j | \lambda_0)\text{Po}(z_1 - j | \lambda_1)\text{Po}(j | \lambda_2)}{f_\text{BP}(Z_0, Z_1 | \lambda_k^{(t)})}

The denominator is independent of our index, jj, and we can change the bounds of the summation in two ways:

  • Firstly we can use that fact that the j=0j=0 term in the sum is clearly zero to shift the start of the summation,
  • Secondly we see that for any j>z0j>z_0 or j>z1j>z_1 the Poisson density is zero (the domain of the Poisson is non-negative integers). Therefore the upper bound of the sum is max(z0,z1)\max(z_0, z_1).

Therefore

E[Z0]=1fBP(Z0,Z1λk(t))j=1max(z0,z1)j Po(z0jλ0)Po(z1jλ1)Po(jλ2) \mathbb{E}[Z_0]= \frac{1}{f_\text{BP}(Z_0, Z_1 | \lambda_k^{(t)})}\sum_{j=1}^{\max(z_0, z_1)} j \ \text{Po}(z_0 - j | \lambda_0)\text{Po}(z_1 - j | \lambda_1)\text{Po}(j | \lambda_2)

Substituting in the density of the Poisson (and suppressing, for now, the EM iteration index (t)(t))

E[Z0]=1fBP(Z0,Z1λk)j=1max(z0,z1)jλ0z0jeλ0(z0j)!λ1z1jeλ1(z1j)!λ2jeλ2j! \mathbb{E}[Z_0]= \frac{1}{f_\text{BP}(Z_0, Z_1 | \lambda_k)}\sum_{j=1}^{\max(z_0, z_1)} j \frac{\lambda_0^{z_0 - j} e^{\lambda_0}}{(z_0 - j)!} \frac{\lambda_1^{z_1 - j} e^{\lambda_1}}{(z_1 - j)!} \frac{\lambda_2^{j} e^{\lambda_2}}{j!}

We can pull out some factors of the lambdas:

E[Z0]=λ2eλ0λ1λ2λ0z01λ1z11fBP(Z0,Z1λk)j=1max(z0,z1)j1(z0j)!(z1j)!j!(λ2λ1λ1)j1 \mathbb{E}[Z_0] = \frac{\lambda_2 e^{-\lambda_0-\lambda_1-\lambda_2}\lambda_0^{z_0-1}\lambda_1^{z_1-1}}{f_\text{BP}(Z_0, Z_1 | \lambda_k)} \sum_{j=1}^{\max(z_0, z_1)} j \frac{1}{(z_0 - j)!(z_1 - j)!j!}\left(\frac{\lambda_2}{\lambda_1\lambda_1}\right)^{j - 1}

Now some more index trickery, we let p=j1p = j - 1 in the sum:

=λ2eλ0λ1λ2λ0z01λ1z11fBP(Z0,Z1λk)p=0max(z01,z11)(p+1)1(z0p1)!(z1p1)!(p+1)!(λ2λ0λ1)p=λ2eλ0λ1λ2λ0z01λ1z11fBP(Z0,Z1λk)(z01)!(z11)!p=0max(z01,z11)(z01)!(z11)!(z0p1)!(z1p1)!p!(λ2λ0λ1)p\begin{align} &= \frac{\lambda_2 e^{-\lambda_0-\lambda_1-\lambda_2}\lambda_0^{z_0-1}\lambda_1^{z_1-1}}{f_\text{BP}(Z_0, Z_1 | \lambda_k)} \sum_{p = 0}^{\max(z_0 - 1, z_1 - 1)} (p + 1) \frac{1}{(z_0 - p - 1)!(z_1 - p - 1)!(p + 1)!}\left(\frac{\lambda_2}{\lambda_0\lambda_1}\right)^{p} \nonumber \\ &= \frac{\lambda_2 e^{-\lambda_0-\lambda_1-\lambda_2}\lambda_0^{z_0-1}\lambda_1^{z_1-1}}{f_\text{BP}(Z_0, Z_1 | \lambda_k)(z_0 - 1)!(z_1 - 1)!} \sum_{p = 0}^{\max(z_0 - 1, z_1 - 1)} \frac{(z_0 - 1)!(z_1 - 1)!}{(z_0 - p - 1)!(z_1 - p - 1)!p!}\left(\frac{\lambda_2}{\lambda_0\lambda_1}\right)^{p} \nonumber \end{align}

we recognise the combination of factorials inside the sum as binomial coefficients

=λ2eλ0λ1λ2λ0z01λ1z11fBP(Z0,Z1λk)(z01)!(z11)!p=0max(z01,z11)(z01p)(z11p)p!(λ2λ0λ1)p\begin{align} &= \frac{\lambda_2 e^{-\lambda_0-\lambda_1-\lambda_2}\lambda_0^{z_0-1}\lambda_1^{z_1-1}}{f_\text{BP}(Z_0, Z_1 | \lambda_k)(z_0 - 1)!(z_1 - 1)!}\sum_{p = 0}^{\max(z_0 - 1, z_1 - 1)} {z_0 - 1\choose p} {z_1 - 1\choose p} p! \left(\frac{\lambda_2}{\lambda_0\lambda_1}\right)^{p} \nonumber \end{align}

and finally we have:

y2(t)EY2Z0,Z1,λk(t)[Y2]=λ2fBP(Z01,Z11λk(t))fBP(Z0,Z1λk(t))\begin{align} y_2^{(t)} \equiv \mathbb{E} _ {Y_2|Z_0, Z_1, \lambda _ k^{(t)}}[Y _ 2] &= \lambda_2\frac{f_\text{BP}(Z_0 - 1, Z_1 - 1 | \lambda_k^{(t)})}{f_\text{BP}(Z_0, Z_1 | \lambda_k^{(t)})} \nonumber \end{align}

The EM algorithm

The auxilliary function, Q\mathcal{Q}, is then just the log-likelihood where we use the expected value for the latent variable we just computed

Q(λλk(t))=nλ0+i=1n(z0iy2i(t))logλ0nλ1+i=1n(z1iy2i(t))logλ1nλ2+i=1ny2i(t)logλ2\begin{align} \mathcal{Q}(\lambda | \lambda_{k}^{(t)}) &= -n\lambda_0 + \sum_{i=1}^n(z_{0i} - y_{2i}^{(t)})\log\lambda_0 \nonumber \\ &\quad -n\lambda_1 + \sum_{i=1}^n(z_{1i} - y_{2i}^{(t)})\log\lambda_1 \nonumber \\ &\quad -n\lambda_2 + \sum_{i=1}^n y_{2i}^{(t)}\log\lambda_2 \nonumber \end{align}

where the terms which don’t depend on λk\lambda_k have been dropped. Remembering for a moment that the lambdas are themselves functions of the regression weights:

logλki=βkxi \log\lambda_{ki} = \vec{\beta_k} \cdot \vec{x_{i}}

we can see that to maximise the Q\mathcal{Q} function we have to perform three separate Poisson regressions:

to compute the weights β\beta. In summary then the EM iterations look like this:

  • E-step: Using our current best guess for the model weights, β(t1)\beta^{(t - 1)}, we compute y2i(t)y_{2i}^{(t)} for each datapoint,
  • M-step: We perform the follow three Poisson regressions z0iy2i(t)Po(λ0i(t1)), logλ0i(t1)=β0(t1)xiz1iy2i(t)Po(λ1i(t1)), logλ1i(t1)=β1(t1)xiy2i(t)Po(λ2i(t1)), logλ2i(t1)=β2(t1)xi\begin{align} z_{0i} - y_{2i}^{(t)} &\sim \text{Po}(\lambda_{0i}^{(t-1)}), \ \log\lambda_{0i}^{(t-1)} = \vec{\beta_0}^{(t-1)}\cdot\vec{x_i} \nonumber \\ z_{1i} - y_{2i}^{(t)} &\sim \text{Po}(\lambda_{1i}^{(t-1)}), \ \log\lambda_{1i}^{(t-1)} = \vec{\beta_1}^{(t-1)}\cdot\vec{x_i} \nonumber \\ y_{2i}^{(t)} &\sim \text{Po}(\lambda_{2i}^{(t-1)}), \ \log\lambda_{2i}^{(t-1)} = \vec{\beta_2}^{(t-1)}\cdot\vec{x_i} \nonumber \end{align}

to find our next estimates for the model weights β2(t)\vec{\beta_2}^{(t)}.

Implementation in python

And now lets implement all of this in python. We start by generating the latent YjY_j data from independent Poissons and the combining them to form the ZiZ_i:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
@dataclass
class LatentData:
    lambdas: np.array
    Y0: np.array
    Y1: np.array
    Y2: np.array

@dataclass
class BivariateData:
    dims: Dims
    latent: LatentData
    observed: ObservedData

def generate_observed_bivariate_poisson_data(YY_latent: np.array) -> np.array:
    return np.stack([
        YY_latent[0,:] + YY_latent[2,:],
        YY_latent[1,:] + YY_latent[2,:],
    ])

def generate_bivariate_poisson_data(true_betas: np.array, N: int, sigma: float) -> BivariateData:
    M = true_betas.shape[0]
    covariates = np.random.normal(0, sigma, (N, M))
    # Fix the first row to allow for a constant term in the regression
    covariates[0, :] = 1
    # Build the lambdas from the betas, and then sample from the Poissons
    lambdas = np.exp(covariates.dot(true_betas))
    latent_poisson_draws = np.random.poisson(lambdas)
    observations = generate_observed_bivariate_poisson_data(latent_poisson_draws.T)
    return BivariateData(
        Dims(N, M),
        LatentData(lambdas, *latent_poisson_draws.T),
        ObservedData(covariates, *observations),
    )

true_betas = np.array([
    [0.3,  0.2,  0.0,  0.3, 0.0,  0.0],
    [0.5,  0.0, -0.1,  0.0, 0.0, -0.5],
    [0.0,  0.0,  0.0,  0.0, 1.0,  0.0],
]).T

dataset = generate_bivariate_poisson_data(true_betas, N=10000, sigma=1)

We can check the correlation matrix of the latent data, and the observed data to see that this has worked as expected:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
import pandas as pd

pd.DataFrame(
    data=np.array([
        dataset.latent.Y0,
        dataset.latent.Y1,
        dataset.latent.Y2,
    ]).T,
    columns=['Y0', 'Y1', 'Y2']
).corr()

Dataset

1
2
3
4
5
6
7
8
9
import pandas as pd

pd.DataFrame(
    data=np.array([
        dataset.observed.Z0,
        dataset.observed.Z1,
    ]).T,
    columns=['X0', 'X1']
).corr()

Dataset

The ZiZ_i’ have a strong degree of correlation, so far so good. We need the Bivariate Poisson density, fBP(Z0,Z1λ0,λ1,λ2)f_\text{BP}(Z_0, Z_1 | \lambda_0, \lambda_1, \lambda_2):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
from scipy.special import factorial

EPS = 1e-16

def choose(n: int, p: int) -> int:
    return factorial(n) / factorial(p) / factorial(n - p)

def bivariate_poisson(x: int, y: int, l0: float, l1: float, l2: float) -> float:
    coeff = (
        np.exp(-l0 - l1 - l2) * np.power(l0, x) *
        np.power(l1, y) / factorial(x) / factorial(y)
    )
    p = coeff * sum([
        choose(x, i) * choose(y, i) * factorial(i) * (l2 / l0 / l1) ** i
        for i in range(0, min(x, y) + 1)
    ])
    return max(p, EPS)

For the E-step of the EM algorithm we must compute the expectation y2(t)=EY2Z0,Z1,λk(t)[Y2]y_2^{(t)} = \mathbb{E} _ {Y_2|Z_0, Z_1, \lambda _ k^{(t)}}[Y _ 2] for each datapoint, ii:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def _compute_next_y2ti(x, y, l0, l1, l2):
    if min(x, y) < 1:
        return 0
    return l2 * bivariate_poisson(x - 1, y - 1, l0, l1, l2) / bivariate_poisson(x, y, l0, l1, l2)

def compute_next_y2ti(observed_data: ObservedData, lambdas: np.array) -> np.array:
    return np.array([
        _compute_next_y2ti(z0, z1, *lams)
        for (z0, z1, lams) in zip(observed_data.Z0, observed_data.Z1, lambdas)
    ])

For the M-step we can re-use the 1D Poisson regression code from earlier in this post to compute the betas, and from the betas we get our new estimate for the lambdas:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
def compute_next_betas(
    data: BivariateData,
    y2ti: np.array,
) -> np.array:
    return np.array([
        poisson_regression(data.observed.covariates, data.observed.Z0 - y2ti),
        poisson_regression(data.observed.covariates, data.observed.Z1 - y2ti),
        poisson_regression(data.observed.covariates, y2ti),
    ]).T

def compute_next_lambdas(betas, covariate_matrix):
    return np.exp(covariate_matrix.dot(betas))

That is everything we really need to fit this model, but to make the output more interesting we should include a little extra code to track the negative log-likelihood of the model, and the mean squared error to see that the EM algorithm is converging:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
@dataclass
class FitResult:
    betas: np.array
    neg_loglikelihood: float

def bivariate_poisson_nll(betas: np.array, observed_data: ObservedData):
    lambdas = np.exp(observed_data.covariates.dot(betas))
    return -np.sum([
        np.log(bivariate_poisson(z0, z1, *lams))
        for (z0, z1, lams) in zip(observed_data.Z0, observed_data.Z1, lambdas)
    ])

def bivariate_poisson_prediction(betas, observed_data: ObservedData):
    lambdas = np.exp(observed_data.covariates.dot(betas))
    return np.stack([lambdas[:, 0] + lambdas[:, 2], lambdas[:, 1] + lambdas[:, 2]])

def compute_mse(predictions, Z0, Z1):
    return np.mean((predictions[0,:] - Z0 + predictions[1,:] - Z1) ** 2)

def compute_bivariate_poisson_data_mse(betas: np.array, observed_data: ObservedData) -> float:
    preds = bivariate_poisson_prediction(betas, observed_data)
    return compute_mse(preds, observed_data.Z0, observed_data.Z1)

And finally we fit the model:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
def bivariate_poisson_fit_using_em(
    dataset: BivariateData,
    em_max_itr: int=20,
    logging: bool=False
) -> FitResult:
    # Construct some initial guesses for the lambdas and betas
    betas = np.zeros((dataset.dims.M, 3))
    lambdas = compute_next_lambdas(betas, dataset.observed.covariates)

    for itr in range(1, em_max_itr + 1):
        nll = bivariate_poisson_nll(betas, dataset.observed)
        bivpoisson_mse = compute_bivariate_poisson_data_mse(betas, dataset.observed)
        if logging:
            print(f"{itr:<4} {nll:<10.3f} {bivpoisson_mse:.3f}")

        # E-step:
        y2ti = compute_next_y2ti(dataset.observed, lambdas)

        # M-step:
        betas = compute_next_betas(dataset, y2ti)
        lambdas = compute_next_lambdas(betas, dataset.observed.covariates)

    return FitResult(betas, nll)

fit_bp = bivariate_poisson_fit_using_em(dataset, logging=True)

>>> 1    45134.088  33.348
>>> 2    34808.743  11.267
>>> 3    34207.242  10.415
>>> 4    33747.553  9.645
>>> 5    33498.612  9.208
>>> 6    33406.624  9.064
>>> 7    33381.469  9.042
>>> 8    33375.651  9.046
>>> 9    33374.371  9.052
>>> 10   33374.077  9.054

we can compare the model fitted β\beta values to the true β\beta’s and see that they are in good agreement:

1
2
3
4
5
6
7
8
9
fit_bp.betas.round(2).T
>>> array([[ 0.3 ,  0.21, -0.  ,  0.3 ,  0.  ,  0.02],
>>>        [ 0.5 , -0.  , -0.1 , -0.01, -0.01, -0.5 ],
>>>        [-0.  , -0.01, -0.01, -0.01,  1.  ,  0.  ]])

true_betas.round(2).T
>>> array([[ 0.3,  0.2,  0. ,  0.3,  0. ,  0. ],
>>>        [ 0.5,  0. , -0.1,  0. ,  0. , -0.5],
>>>        [ 0. ,  0. ,  0. ,  0. ,  1. ,  0. ]])

We can also perform a Double Poisson fit and compare the MSE of the two models.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
def double_poisson_fit(data: BivariateData) -> FitResult:
    betas = np.array([
        poisson_regression(data.observed.covariates, data.observed.XX),
        poisson_regression(data.observed.covariates, data.observed.YY),
    ]).T
    return FitResult(betas)

def double_poisson_prediction(betas, data: ObservedData):
    lambdas = np.exp(data.covariates.dot(betas))
    return np.stack([lambdas[:, 0], lambdas[:, 1]])

fit_dp = double_poisson_fit(dataset)
preds_dp = double_poisson_prediction(fit_dp.betas, dataset.observed)
print(compute_mse(preds_dp, dataset.observed.XX, dataset.observed.YY))
>>> 16.204314367381187

which is significantly worse than the 9.054 managed by the Bivariate Poisson regression.

References

[1]: Machine Learning A Probabilistic Perspective (K. P. Murphy)

[2]: Bivariate Poisson and Diagonal Inflated Bivariate Poisson Regression Models in R

[3]: An introduction to EM

[4]: An application of EM to censored linear regression

[5]: Bayesian Modeling Using WinBUGS