Pomegranate: Question: Sampling from conditional probability

Created on 28 Feb 2016  ·  18Comments  ·  Source: jmschrei/pomegranate

Hello,
I don't know if this is the appropriate place to ask this, please tell me if it isn't.
For my master thesis I would like estimate conditional probabilities of multiple random variables which can be both discrete and continuous (mixture of continuous and discrete variables). I think this could be done using pomegranate by doing density estimation on this mixture and on the joint distribution of the evidence variables. The result would a fraction of 2 mixture of distributions, but not a closed form distribution. Thus, to sample from the conditional distribution I would have to resort rejection sampling (or an optimised MCMC method). Is this correct?

Most helpful comment

@jmschrei: Sure! The sample() method would be awesome or a possibilty to migrate the pomegranate model to PyMC so sampling can be done there. :)

All 18 comments

pomegranate currently doesn't support conditional distributions which are conditional on continuous distributions, unfortunately. It's something I want to add soon though. The idea sounds plausible though. You might consider PyMC, which is more fully fledged in this area.

Than, if I may, a suggestion. PyMC also supports distribution mixtures, HMM, etc. But there are no factory methods to create them. It would be great if pomegranate (which has a very clear interface) would be able to integrate with the (optimised) sampling methods of PyMC. Maybe, by converting a pomegranate model into a PyMC model.

That seems like a reasonable idea. However, if I had to time to set that up, I'd also have the time to implement it in pomegranate. I'll actually have some time today to work on these issues. If you can provide a sample code snippet that you'd like to see work, I can see if I can get it implemented soon.

I haven't yet used pomegranate, I am in the phase of searching the most complete library for my problem. But with conditional distributions I would have it all. What I do have, is my current code in which I do exactly what I mentioned in my first post, but only using mixture of gaussians (for which a closed form derivation exists).

edit: I'll show a snippet of how I use it now with sklearn GMM

weighted_fit is an extended version of sklearns fit() that handles weighted data

gmm.weighted_fit(data,weight_data)

my data is 3 dimensional continuous vectors [X,Y,Z]

z=1.2

P(X=nan,X=nan | p=Z)

(con_mean, con_cov, con_weights) = utility.cond_dist_gmm(np.array([np.nan,np.nan,z]), gmm.means_, gmm._get_covars(), gmm.weights_)

Illustrative psueodocode for what I need to add in. Basically, if you have something like this:

a = NormalDistribution( 5, 2 )
b = ConditionalGaussianDistribution( w=[5], w0=2, sigma=2 )

That would specify a parent distribution which is a normal, and a conditional normal distribution centered around a linear regression of the parents. Whatever would be most convenient for you.

I'm not sure exactly what you're asking. GMMs are implemented in pomegranate (and I think faster than sklearn). Are you looking for Conditional Gaussian Mixture Models?

Conditional GMMs I have.
I'll phrase it differently, given data with both discrete and continuous variables e.g. d=[c1,c2,d1,d2].
I would like to be able to do this:

k component mixture model, to model the continuous data [d1,d2], the multivariate is 2 dimensional to model vector [d1,d2]

gmm=MixtureDistribution(MultivariateGaussianDistribution1, MultivariateGaussianDistribution2, ..., MultivariateGaussianDistributionk)

discrete distribution

cat1=DiscreteDistribution([(0,0.1),(1,0.3),(2,0.6)])
cat2=DiscreteDistribution([(0,0.6),(1,0.4)])

new model defines the join distribution of both continuous and discrete data -> p(d)

I don't think this is possible, yet, in pomegranate

I think that is maybe why it wasn't exactly clear what I intended, I mixed distribution mixture with mixed joint distribution...

see Joint distribution the "mixed case" at the bottom of the page for the mathematical formulation

d_model=JointDistribution( gmm,cat1,cat2)

fit the model parameters with data

d_model.from_sample(data)

now given that I know (approximately) the joint probability, I would like to know for example the joint probability of [c1,c2] given that d1=1 and d2=0

cond_d = ConditionalDistr(d_model,[np.nan,np.nan,1,0] )

I'm sorry, I'm still not perfectly understanding what you mean. Are you saying that you want a to be able to define a conditional distribution to have both continuous and discrete parents? Basically, a Bayesian network with both discrete and continuous variables in it?

Yes, that would be it I think. But I did some more research and I will avoid any form of sampling methods in favour of EM. Even if this reduces the expressiveness of my model. So, if you would implement conditional gaussian mixture model (my own implementation doesn't work properly), than that would be greatly appreciated. I can provide the paper I used, but you probably have all the resources you need.

Sure, I'd like to look at the paper just to make sure we're on the same page.

There's 2 approaches I've found:
The one I tried: Appendix A in Gaussian Processes for Machine Learning
or Chapter 2 in Conditional Gaussian Mixture Models for
Environmental Risk Mapping

Did you make progress towards the conditional distribution?

I haven't had much time to get into something as in depth as this yet, prospective student visit days are this week. I'll return to this issue soon.

This completely fell off my radar, I'm sorry! If I worked on this soon, would it be of use to you, or did you complete your thesis?

No problem.
And thank you for getting back to me, but my thesis is finished.
Good luck with your library.

On 01 Sep 2016, at 04:21, Jacob Schreiber [email protected] wrote:

This completely fell off my radar, I'm sorry! If I worked on this soon, would it be of use to you, or did you complete your thesis?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub https://github.com/jmschrei/pomegranate/issues/86#issuecomment-243958007, or mute the thread https://github.com/notifications/unsubscribe-auth/AG5qEFMbUPk0uoKA4JTxdt8CaxTfPX29ks5qlja_gaJpZM4Hkxwo.

Dear @jmschrei what is your process on this topic? Would be also highly interested. thanks!

Hi @jaSunny. Can you describe exactly what it is you'd like implemented?

@jmschrei: Sure! The sample() method would be awesome or a possibilty to migrate the pomegranate model to PyMC so sampling can be done there. :)

Was this page helpful?
0 / 5 - 0 ratings