# Kernel density estimation for circular functions

28th March, 2014Matlab code for performing kernel density estimates on periodic and aperiodic domains, for unidimensional and multidimensional data.

Estimating multi-dimensional PDFs over periodic domains is reasonably straightforward, but can't be done with `Matlab`

's built-in tools. Here are two functions for kernel density estimates for circular functions. By default they will estimate a PDF over the periodic domain, and can be used with weighted and unweighted data.

# Download and installation

Download the source code: circ_ksdensity.zip

Unzip the file into a directory on the matlab path. Two functions are included: `circ_ksdensity`

and `circ_ksdensityn`

.

# Uni-dimensional kernel density estimates — `circ_ksdensity`

Usage:

`[vfEstimate] = circ_ksdensity(vfObservations, vfPDFSamples, <vfDomain, fSigma, vfWeights>)`

This function calculates a kernel density estimate of an (optionally weighted) data sample, over a periodic domain.

`vfObservations`

is a set of observations made over a periodic domain, optionally defined by `vfDomain`

: `[fMin fMax]`

. The default domain is `[0..2*pi]`

. `vfPDFSamples`

defines the sample points over which to perform the kernel density estimate, over the same domain as `vfObservations`

.

Weighted estimations can be performed by providing the optional argument `vfWeights`

, where each element in `vfWeights`

corresponds to the matching element in `vfObservations`

.

The kernel density estimate will be performed using a wrapped Gaussian kernel, with a width estimated as

`(4/3)^0.2 * circ_std(vfObservations, vfWeights) *(length(vfObservations^-0.2)`

The optional argument `fSigma`

can be provided to set the width of the kernel.

`vfEstimate`

will be a vector with a (weighted) estimate of the underlying distribution, with an entry for each element of `vfPDFSamples`

. If no weighting is supplied, the estimate will be scaled such that it forms a PDF estimate over the supplied sample domain, taking into account sample bin widths. If a weight vector is supplied then the estimate will be scaled such that the sum over the domain attempts to match the sum of weights, taking into account sample bin widths.

# Multi-dimensional kernel density estimates — `circ_ksdensityn`

Usage:`[vfEstimate, vfBinVol] = circ_ksdensityn(mfObservations, mfPDFSamples, <mfDomains, vfSigmas, vfWeights>)`

This function calculates a kernel density estimate of an (optionally weighted) data sample, over periodic and aperiodic domains. The sample is assumed to be independent across dimensions; i.e. density estimation is performed independently for each dimension of the data.

`mfObservations`

is a set of observations made over a (possibly periodic) domain. Each row corresponds to a single observation, each column corresponds to a particular dimension. By default all dimensions are periodic in `[0..2*pi]`

; this can be modified by providing the optional argument `mfDomains`

. Each row in `mfDomains`

is `[fMin fMax]`

, one row for each dimension in `mfObservations`

. If a particular dimension should not be periodic, the corresponding row should be `[nan nan]`

. Bounded support over a dimension is NOT implemented; each dimension is either linear and infinite or periodic.

`mfPDFSamples`

defines the sample points over which to perform the kernel density estimate, over the same domains as `mfObservations`

.

Weighted estimations can be performed by providing the optional argument `vfWeights`

, where each element in `vfWeights`

corresponds to the matching observation in `mfObservations`

.

The kernel density estimate will be performed using a multivariate Gaussian kernel, independent along each dimension, and wrapped along the periodic dimensions as appropriate. Kenel widths over periodic dimensions are estimated as

`(4/3)^0.2 * circ_std(mfObservations(:, nDim), vfWeights) * (length(mfObservations)^-0.2)`

Kernel widths over non-periodic dimensions are estimated as

`(4 * std(mfObservations(:, nDim), vfWeights)^5 / 3 / length(mfObservations))^(1/5)`

The optional argument `vfSigmas`

can be provided to set the width of each kernel.

`vfEstimate`

will be a vector with a (weighted) histogram estimate of the underlying distribution, with an entry for each point in `mfPDFSamples`

. If no weighting is supplied, the estimate will be scaled to estimate a PDF over the supplied multi-dimensional domain, taking into account the estimated volume of each bin. If a weight vector is supplied, the estimate will be scaled such that the sum over the domain attempts to match the sum of weights, taking into account the multi-dimensional bin volumes.

`vfBinVol`

is a vector containing volume estimates for each row in `mfPDFSamples`

, under the assumption that each dimension is linearly scaled and mutually orthogonal.