How not to calibrate a model

Any model is a simplification of reality. If it isn’t, then it isn’t a model as rather is the reality.


Any simplified model I can imagine will also therefore not match reality exactly. The closer the model gets to the real world in more scenarios, the better it is.

Not all model parameters are created equal

Part of the approach to getting a model to match reality as closely as possible is calibration. Models will typically have a range of parameters. Some will be well-established and can be set confidently without much debate. Others will have a range of reasonable or possible values based on empirical research or theory. Yet others will be relatively arbitrary or unobservable.

We don’t have to guess these values, even for the unobservable parameters. Through the process of calibration, the outputs of our model can be matched as closely as possible to actual historical values by changing the input parameters. The more certain we are of the parameters a priori the less we vary the parameters to calibrate the model. The parameters with most uncertainty are free to move as much as possible to fit the desired outputs.

During this process, the more structure or relationships that can be specified the better. The danger is that with relatively few data points (typically) and relatively many parameters (again typically) there will be multiple parameter sets that fit the data with possibly only very limited difference in “goodness of fit” for the results. The more information we add to the calibration process (additional raw data, more narrowly constrained parameters based on other research, tighter relationships between parameters) the more likely we are to derive a useful, sensible model that not only fits out calibration data well but also will be useful for predictions of the future or different decisions.

How not to calibrate a model

Scientific American has a naive article outlining “why economic models are always wrong”. I have two major problems with the story:

  1. All models are wrong. Some are useful (George Box). “wrongness” isn’t a problem with a model, but lack of usefulness is. The headline demonstrates a starting point poorly informed about the point of economic models.
  2. The calibration approach criticised in the article is an extremely poor way to calibrate a model. No serious researcher thinks that is the right way to calibrate a model. So the article merely creates a straw man and then demonstrates how easy it is to knock the argument over.

Calibration and back-testing on separate data sets

The right way to calibrate a model is to separate the data-set into at least two independent subsets. Firstly, the “training set” or portion from which we will calibrate our parameters to get them to match as closely as possible the data. Again, this should make use of all information available and may give rise to several competing models that appear to fit the data similarly well.

The next step is crucial. We back-test the derived models against the second subset of data. This data comes from the same reality (perhaps a different time period) as used to calibrate the model, but the model won’t trivially match the data because none of that data was used to calibrate the model in the first place.

The importance of back-testing

Back-testing is critically important in the model building process, but back-testing against the same data used to calibrate the model is worth than useless (since it takes time and effort and an create a false sense of accuracy or reliability in the model.) Separating the data into two or more subsets is absolutely required, although it has the unfortunate side-effect of reducing the size of the data-set available for calibration.

Yes, it really matters.

A common example of the dangers of bad models fitting data well is with Economic Scenario Generators. These simulate economic scenarios to be used in valuing complex financial securities. If a model is properly calibrated, it will recreate the observable market prices of a wide range of instruments. However, the model could be a black-box neural network, a carefully constructed theoretical model with plausible relationships and constraints, or the proverbial ten thousand (possibly inebriated) monkeys. If all three models are perfectly calibrated to observable market prices, is any of the models inferior to any of the others?

Clearly the answer is yes, but only when it comes to extrapolation. I have far more confidence in the model’s ability to create “market consistent” valuations for instruments that do not have observable prices in the market if I understand how the mechanics of the model make sense on a level other than pure calibration.

Trivial 3 point example

3 point example of fitted models
Perfect fit of two models to 3 data points

The example above shows a perfect of a quadratic and cubic model to 3 data points. From this graph, both models appear exactly the same.

However, if we use the model to extrapolate to future time periods, the results are very different. Without additional data to back-test the results on, it’s not possible to tell whether either or any of these models is appropriate, but clearly both can’t be correct.

Example showing extrapolation of two models diverging from each other
Extrapolation of the two models shows divergent results