Kernel density estimation for circular functions

28th March, 2014
A circular distribution (blue line) estimated using a kernel density method.

Matlab code for performing kernel density estimates on periodic and aperiodic domains, for unidimensional and multidimensional data.

Estimating multi-dimensional PDFs over periodic domains is reasonably straightforward, but can't be done with Matlab's built-in tools. Here are two functions for kernel density estimates for circular functions. By default they will estimate a PDF over the periodic domain, and can be used with weighted and unweighted data.

Download and installation

Download the source code: circ_ksdensity.zip
Unzip the file into a directory on the matlab path. Two functions are included: circ_ksdensity and circ_ksdensityn.

Uni-dimensional kernel density estimates — circ_ksdensity

Usage:
[vfEstimate] = circ_ksdensity(vfObservations, vfPDFSamples, <vfDomain, fSigma, vfWeights>)

This function calculates a kernel density estimate of an (optionally weighted) data sample, over a periodic domain.

vfObservations is a set of observations made over a periodic domain, optionally defined by vfDomain: [fMin fMax]. The default domain is [0..2*pi]. vfPDFSamples defines the sample points over which to perform the kernel density estimate, over the same domain as vfObservations.

Weighted estimations can be performed by providing the optional argument vfWeights, where each element in vfWeights corresponds to the matching element in vfObservations.

The kernel density estimate will be performed using a wrapped Gaussian kernel, with a width estimated as
(4/3)^0.2 * circ_std(vfObservations, vfWeights) *(length(vfObservations^-0.2)

The optional argument fSigma can be provided to set the width of the kernel.

vfEstimate will be a vector with a (weighted) estimate of the underlying distribution, with an entry for each element of vfPDFSamples. If no weighting is supplied, the estimate will be scaled such that it forms a PDF estimate over the supplied sample domain, taking into account sample bin widths. If a weight vector is supplied then the estimate will be scaled such that the sum over the domain attempts to match the sum of weights, taking into account sample bin widths.

Multi-dimensional kernel density estimates — circ_ksdensityn

Usage:
[vfEstimate, vfBinVol] = circ_ksdensityn(mfObservations, mfPDFSamples, <mfDomains, vfSigmas, vfWeights>)

This function calculates a kernel density estimate of an (optionally weighted) data sample, over periodic and aperiodic domains. The sample is assumed to be independent across dimensions; i.e. density estimation is performed independently for each dimension of the data.

mfObservations is a set of observations made over a (possibly periodic) domain. Each row corresponds to a single observation, each column corresponds to a particular dimension. By default all dimensions are periodic in [0..2*pi]; this can be modified by providing the optional argument mfDomains. Each row in mfDomains is [fMin fMax], one row for each dimension in mfObservations. If a particular dimension should not be periodic, the corresponding row should be [nan nan]. Bounded support over a dimension is NOT implemented; each dimension is either linear and infinite or periodic.

mfPDFSamples defines the sample points over which to perform the kernel density estimate, over the same domains as mfObservations.

Weighted estimations can be performed by providing the optional argument vfWeights, where each element in vfWeights corresponds to the matching observation in mfObservations.

The kernel density estimate will be performed using a multivariate Gaussian kernel, independent along each dimension, and wrapped along the periodic dimensions as appropriate. Kenel widths over periodic dimensions are estimated as
(4/3)^0.2 * circ_std(mfObservations(:, nDim), vfWeights) * (length(mfObservations)^-0.2)

Kernel widths over non-periodic dimensions are estimated as
(4 * std(mfObservations(:, nDim), vfWeights)^5 / 3 / length(mfObservations))^(1/5)

The optional argument vfSigmas can be provided to set the width of each kernel.

vfEstimate will be a vector with a (weighted) histogram estimate of the underlying distribution, with an entry for each point in mfPDFSamples. If no weighting is supplied, the estimate will be scaled to estimate a PDF over the supplied multi-dimensional domain, taking into account the estimated volume of each bin. If a weight vector is supplied, the estimate will be scaled such that the sum over the domain attempts to match the sum of weights, taking into account the multi-dimensional bin volumes.

vfBinVol is a vector containing volume estimates for each row in mfPDFSamples, under the assumption that each dimension is linearly scaled and mutually orthogonal.