Functional Chemometrics: Automated Spectral Smoothing with Spatially Adaptive Splines
Fernandes, Philip Manuel
MetadataShow full item record
Functional data analysis (FDA) is a demonstrably effective, practical, and powerful method of data analysis, yet it remains virtually unheard of outside of academic circles and has almost no exposure to industry. FDA adds to the milieu of statistical methods by treating functions of one or more independent variables as data objects, analogous to the way in which discrete points are the data objects we are familiar with in conventional statistics. The first step in functional analysis is to “functionalize” the data, or convert discrete points into a system represented most times by continuous functions. Choosing the type of functions to use is data-dependent and often straightforward – for example, Fourier series lend themselves well to periodic systems, while splines offer great flexibility in approximating more irregular trends, such as chemical spectra. This work explores the question of how B-splines can be rapidly and reliably used to denoised infrared chemical spectra, a difficult problem not only because of the many parameters involved in generating a spline fit, but also due to the disparate nature of spectra in terms of shape and noise intensity. Automated selection of spline parameters is required to support high-throughput analysis, and the heteroscedastic nature of such spectra presents challenges for existing techniques. The heuristic knot placement algorithm of Li et al. (2005) for 1D object contours is extended to spectral fitting by optimizing the denoising step for a range of spectral types and signal/noise ratios, using the following criteria: robustness to types of spectra and noise conditions, parsimony of knots, low computational demand, and ease of implementation in high-throughput settings. Pareto-optimal filter configurations are determined using simulated data from factorial experimental designs. The improved heuristic algorithm uses wavelet transforms and provides improved performance in robustness, parsimony of knots and the quality of functional regression models used to correlate real spectral data with chemical composition. In practical applications, functional principal component regression models yielded similar or significantly improved results when compared with their discrete partial least squares counterparts.