Bivariate Poisson Regression with Expectation-Maximisation
October 1, 2022 · 3164 words · Jack Medley
Count data is commonly modelled using a generalised linear model with a Poisson likelihood, and this approach extends trivially to cases with multiple count data random variables, Zi∈Z0+,i=1…N which are independent of one another, Zi⊥Zj,∀i,j because the likelihoods factor out nicely. But what about the case where there is some correlation?
In this short post I’ll describe the case in which we have two correlated count variables and we believe that a linear function of some known covariates can explain variation in the data.
The poisson distribution is specified by a parameter, λi∈R, which turns out to be both its mean and variance. It’s PMF for an observation, zi, is given by
Po(zi∣λi)=zi!λizie−λi.
Assuming the data is IID the log-likelihood is given by:
L(λi∣zi)=i∑nzilog(λi)−λi−log(zi!).
The last piece we need, a ’link function’: There are a couple of options we can use to relate the Poisson parameter to our model parameters, βj, and the observed covariates, zj, but the common choice is the log-link:
logλi=β⋅xi
Typically we would take derivatives of the log-likelihood with respect to the wj using the chain rule and use the iteratively reweighted least squares algorithm (see page 294 of [1] for details on that approach) but here we can be lazy and let scipy’s brentq minimisation do the hard work for us (this will require more function evaluations than if we provided the derivative of the negative log-likelihood and prefered L-BFGS over brentq, but everything here runs in O(seconds) on my laptop anyway)
We’re nearly ready to start fitting some models, but first some housekeeping to make our life easier later:
We can generate some fake data from a true model lambdas by generating some random covariates and using some true model β’s, and then sampling from the resulting Poisson distributions:
1
2
3
4
5
6
7
8
defgenerate_poisson_data(true_betas:np.array,N:int,sigma:float=1)->PoissonData:M=true_betas.shape[0]covariates=np.random.normal(0,sigma,(N,M))# Fix the first row to allow for a constant term in the regressioncovariates[0,:]=1lambdas=np.exp(covariates.dot(true_betas))poisson_draws=np.random.poisson(lambdas)returnPoissonData(Dims(N,M),ObservedData(covariates,poisson_draws))
and, as above, in order to fit the model we need an expression for the negative (so make the problem is a minimisation) log-likelihood and for neatness, and further use, we’ll want a little wrapper around scipy’s minimise function:
1
2
3
4
5
6
7
8
9
10
defpoisson_nll(betas,covariates,targets):loglambda=covariates.dot(betas)lambda_=np.exp(loglambda)return-np.sum(targets*loglambda-lambda_)defpoisson_regression(covariates,targets):betas=np.ones(covariates.shape[1])r=optimize.minimize(poisson_nll,betas,args=(covariates,targets),options={'gtol':1e-2})assertr.success,f"Poisson Regression failed with status: {r.status} ({r.message})"returnr.x
and with these we can generate, and fit, some Poisson data from the following model:
We can see that our fit is in good agreement with statsmodels, but we also get lots of other interesting information (coefficient uncertainty, NLL, deviance, etc.).
Correlations between two Poisson random variables#
As mentioned in the introduction, if we have two Poisson random variables, Z1 and Z2, with no correlation (known as the ‘Double Poisson’ model) then the above approach still holds since the likelihood factorises:
and to maximise this we can just maximise the two sums separately. When doing so all of derivate crossterms, ∂βnλm, are zero for n=m because of our ’no correlation’ stipulation.
But what if we suspect that our two random variables are correlated?
The Bivariate Poisson model deals with exactly this case – let Y0,Y1,Y2 be independent univariate Poisson distributed with parameters λ0,λ1,λ2 respectively. Then
Z0Z1=Y0+Y2=Y1+Y2
are bivariate poisson distributed. Clearly Z0 and Z1 are maginally Poisson and therefore we have:
E[Z0]E[Z1]=λ0+λ2=λ1+λ2
since sums of two independent Poisson distribution results in another Poisson parametrised by the sum of the two input distribution parameters. But more interestingly we also have
We can see that when λ2→0 this reduces to the Double Poisson model since only the k=0 term in the sum contributes and it contributes 1, leaving only the coefficient in front remaining.
where i indexes over the paired datapoints and k=1…3, and associated covariates. It is possible that the λk depend on different (or even distinct) subsets of the covariates, but we can always treat this case using a single xi created by concatenating the different features and enforcing that the pertinent components of β are zero.
Fitting the Bivariate Poisson with the EM algorithm#
We saw above that we can directly maximise the likelihood of a univariate Poisson distribution, but with the Bivariate Poisson case it is more awkward. The unobserved nature of Y2 means that there is not a neat closed form expression for MLE.
Instead we can follow the approach taken in the Karlis and Ntzoufras paper [2] and take an Expectation-Maximisation approach [3][4]. I also recommend Ntzoufras’s book which describes lots of interesting models [5].
We can write down a seemingly useless form for the bivariate model, one expressed in terms of all three of the unobserved Poisson components:
f(Y0,Y1,Y2∣Θ)=i=1∏nk=1∏3yk!e−λiλiyk
We can perform a change of variables so that our model is in terms of the two observed random variables (Z0,Z1) and a single latent variable Y2, the determinant of the Jacobian of the transformation is 1 so we have
This is almost the same as the ‘Double Poisson’ model and we know we can maximise it with respect to the λ’s, except that we don’t know the value of the y2i. This is where the Expectation-Maximisation algorithm comes in.
To get the Q function we need to compute the expectation of the log-likelihood with respect to the density f(Y2∣Z0,Z1,λk(t)), where λk(t) are our current guess for the lambdas. Using Baye’s theorem we can write this as:
The denominator is just the Bivariate Poisson density and f(Y2∣λk(t)) is just the Poisson density. The last term, f(Z0,Z1∣Y2,λk(t)), is a little harder to reason about – but if we look to the definitions Z0,Z1 and imagine that Y2 is just a constant we see the following is true
The denominator is independent of our index, j, and we can change the bounds of the summation in two ways:
Firstly we can use that fact that the j=0 term in the sum is clearly zero to shift the start of the summation,
Secondly we see that for any j>z0 or j>z1 the Poisson density is zero (the domain of the Poisson is non-negative integers). Therefore the upper bound of the sum is max(z0,z1).
where the terms which don’t depend on λk have been dropped. Remembering for a moment that the lambdas are themselves functions of the regression weights:
logλki=βk⋅xi
we can see that to maximise the Q function we have to perform three separate Poisson regressions:
to compute the weights β. In summary then the EM iterations look like this:
E-step: Using our current best guess for the model weights, β(t−1), we compute y2i(t) for each datapoint,
M-step: We perform the follow three Poisson regressions
z0i−y2i(t)z1i−y2i(t)y2i(t)∼Po(λ0i(t−1)),logλ0i(t−1)=β0(t−1)⋅xi∼Po(λ1i(t−1)),logλ1i(t−1)=β1(t−1)⋅xi∼Po(λ2i(t−1)),logλ2i(t−1)=β2(t−1)⋅xi
to find our next estimates for the model weights β2(t).
And now lets implement all of this in python. We start by generating the latent Yj data from independent Poissons and the combining them to form the Zi:
@dataclassclassLatentData:lambdas:np.arrayY0:np.arrayY1:np.arrayY2:np.array@dataclassclassBivariateData:dims:Dimslatent:LatentDataobserved:ObservedDatadefgenerate_observed_bivariate_poisson_data(YY_latent:np.array)->np.array:returnnp.stack([YY_latent[0,:]+YY_latent[2,:],YY_latent[1,:]+YY_latent[2,:],])defgenerate_bivariate_poisson_data(true_betas:np.array,N:int,sigma:float)->BivariateData:M=true_betas.shape[0]covariates=np.random.normal(0,sigma,(N,M))# Fix the first row to allow for a constant term in the regressioncovariates[0,:]=1# Build the lambdas from the betas, and then sample from the Poissonslambdas=np.exp(covariates.dot(true_betas))latent_poisson_draws=np.random.poisson(lambdas)observations=generate_observed_bivariate_poisson_data(latent_poisson_draws.T)returnBivariateData(Dims(N,M),LatentData(lambdas,*latent_poisson_draws.T),ObservedData(covariates,*observations),)true_betas=np.array([[0.3,0.2,0.0,0.3,0.0,0.0],[0.5,0.0,-0.1,0.0,0.0,-0.5],[0.0,0.0,0.0,0.0,1.0,0.0],]).Tdataset=generate_bivariate_poisson_data(true_betas,N=10000,sigma=1)
We can check the correlation matrix of the latent data, and the observed data to see that this has worked as expected:
For the M-step we can re-use the 1D Poisson regression code from earlier in this post to compute the betas, and from the betas we get our new estimate for the lambdas:
That is everything we really need to fit this model, but to make the output more interesting we should include a little extra code to track the negative log-likelihood of the model, and the mean squared error to see that the EM algorithm is converging:
defbivariate_poisson_fit_using_em(dataset:BivariateData,em_max_itr:int=20,logging:bool=False)->FitResult:# Construct some initial guesses for the lambdas and betasbetas=np.zeros((dataset.dims.M,3))lambdas=compute_next_lambdas(betas,dataset.observed.covariates)foritrinrange(1,em_max_itr+1):nll=bivariate_poisson_nll(betas,dataset.observed)bivpoisson_mse=compute_bivariate_poisson_data_mse(betas,dataset.observed)iflogging:print(f"{itr:<4}{nll:<10.3f}{bivpoisson_mse:.3f}")# E-step:y2ti=compute_next_y2ti(dataset.observed,lambdas)# M-step:betas=compute_next_betas(dataset,y2ti)lambdas=compute_next_lambdas(betas,dataset.observed.covariates)returnFitResult(betas,nll)fit_bp=bivariate_poisson_fit_using_em(dataset,logging=True)>>>145134.08833.348>>>234808.74311.267>>>334207.24210.415>>>433747.5539.645>>>533498.6129.208>>>633406.6249.064>>>733381.4699.042>>>833375.6519.046>>>933374.3719.052>>>1033374.0779.054
we can compare the model fitted β values to the true β’s and see that they are in good agreement: