I was wondering how one can modify Expectation-Maximization procedure for fitting mixtures (well, gaussian mixtures, because it's the only nontrivial and quite general multivariate distribution that can be fitted easily) to support really many overlapping summands in the mixture.

Randomization probably can be a solution to this problem. It seems that expectation in EM algorithm is the proper place to introduce it, but in this case maximization ste should achieve some momentum behavior.

Probably it is a good idea to remind how EM works. There are two steps that are computed iteratively:

  1. (Expectation) where we compute probability that each particular event belongs to each distribution
  2. (Maximization) where given the probabilities we maximize parameters of each distribution.

What if we sample events according to distribution from expectation step? At each stage we will attribute each event to one (in simplest case) component of mixture, or maybe several of them. This kind of randomization should prevent us from 'shrinking' of distribution.

The core idea I am trying to introduce here is very similar to dropout — a trick, which allowed researches to train neural networks with more parameters than amount of observations we have.