Selection of a kernel bandwidth for measuring dependence in hydrologic time series using the mutual information criterion

Citation
Ti. Harrold et al., Selection of a kernel bandwidth for measuring dependence in hydrologic time series using the mutual information criterion, STOCH ENV R, 15(4), 2001, pp. 310-324
Citations number
19
Language
INGLESE
art.tipo
Article
Categorie Soggetti
Environment/Ecology,"Environmental Engineering & Energy
Journal title
STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT
ISSN journal
1436-3240 → ACNP
Volume
15
Issue
4
Year of publication
2001
Pages
310 - 324
Database
ISI
SICI code
1436-3240(200108)15:4<310:SOAKBF>2.0.ZU;2-V
Abstract
Mutual information is a generalised measure of dependence between any two v ariables. It can be used to quantify non-linear as well as linear dependenc e between any two variables. This makes mutual information an attractive al ternative to the use of the correlation coefficient, which can only quantif y the linear dependence pattern. Mutual information is especially suited fo r application to hydrological problems, because the dependence between any two hydrologic variables is seldom linear in nature. Calculation of the mut ual information score involves estimation of the marginal and joint probabi lity density functions of the two variables. This paper uses nonparametric kernel density estimation methods to estimate the probability density funct ions. Accurate estimation of the mutual information score using kernel meth ods requires selection of appropriate smoothing parameters (bandwidths) for use with the kernels. The aim of this paper is to obtain a practical metho d for bandwidth selection for calculation of the mutual information score. In this paper, the lag-one dependence structures of several autocorrelated time series are analysed using mutual information (note that this produces the lag-one auto-MI score, the analog of the lag-one auto correlation). Emp irical trials are used to select appropriate bandwidths for a range of unde rlying autoregressive and autoregressive-moving average models with normal or near-normal parent distributions. Expressions for reasonable bandwidth c hoices under these conditions are proposed.