As a special case of finite mixture models, place a Gaussian at each of the N data points. Then take the sum of these Gaussians. The variance of each Gaussian must be chosen as a parameter, and has a large impact on the result.

$p_{kde}(x) = \frac{1}{N}\sum_{i=1}^N \mathcal{N}(x;x_i,\Sigma)$

$\Sigma$ is often chosen as $\sigma^2I$. $\sigma^2$ is referred to as the **bandwidth** and can be selected by cross validation.

KDE doesn't scale well into high dimensions. The number of data points required scales with $O(e^d)$. It may still work well in low dimensions though.