Selection of Simplified Models and Parameter Estimation Using Limited Data
Simplified Models , Limited Data , Chemical Engineering , Model Selection
Due to difficulties associated with formulating complex models and obtaining reliable estimates of unknown model parameters, modellers often use simplified models (SMs) that are structurally imperfect and that contain a smaller number of parameters. The objectives of this research are: 1) to develop practical and easy-to-use strategies to help modellers select the best SM from a set of candidate models, and 2) to assist modellers in deciding which parameters in complex models should be estimated, and which should be fixed at initial values. The aim is to select models and parameters so that the best possible predictions can be obtained using the available data and the modeller’s engineering and scientific knowledge. This research summarizes the extensive qualitative and quantitative results in the statistics literature regarding the use of SMs. Mean-squared error (MSE) is used to judge the quality of model predictions obtained from different candidate models, and a confidence-interval approach is developed to assess the uncertainties associated with whether a SM or the corresponding extended model will give better predictions. Nine commonly-applied model-selection criteria (MSC) are reviewed and analyzed for their propensities of preferring SMs. It is shown that there exist preferential orderings for many MSC that are independent of model structure and the particular data set. A new MSE-based MSC is developed using univariate linear statistical models. The effectiveness of this criterion for selecting dynamic nonlinear multivariate models is demonstrated both theoretically and empirically. The proposed criterion is then applied for determining the optimal number of parameters to estimate in complex models, based on ranked parameter lists obtained from estimability analysis. This approach makes use of the modeller’s prior knowledge about precision of initial parameter values and is less computationally expensive than comparable methods in the literature.