Modelling one side of a two-sided problem

Ah models, my old friends. You’re always wrong, but sometimes helpful. Often dangerous too.

A recent article in The Actuary magazine addressed whether “de-risking in members’ best interests?”  I say “recent” even though it’s from August because I am a little behind on my The Actuary reading.

In the article, the authors demonstrate that by modelling the impact of covenant risk, optimal investment portfolios for Defined Benefit (DB) pensions actually have more risky assets than if this covenant risk is ignored.

The covenant they refer to is the obligation of the sponsor to make good deficits within the pension fund. Covenant risk then is the risk that the sponsor is unable (typically through its own insolvency) to make good on this promise.

On the surface it should seem counterintuitive that by modelling an additional risk to pensioners, the answer is to invest in riskier assets, thus increasing risk.

The explanation proffered by the authors is that the higher expected returns from riskier assets allow the fund to potentially build up surplus, thus reducing the risks of covenant failure.

I can follow that logic, particularly in the case where the dependence between DB fund insolvency and sponsor default is week. It doesn’t mean it’s a useful result. Continue reading “Modelling one side of a two-sided problem”

Actuarial sloppiness

An actuary I know once made me cringe by saying “It doesn’t matter how an Economic Scenario Generator is constructed, if it meets all the calibration tests then it’s fine. A machine learning black box is as good as any other model with the same scores.” The idea being that if the model outputs correctly reflect the calibration inputs, the martingale test worked and the number of simulations generated produced an acceptably low standard error then the model is fit for purpose and is as good as any other model with the same “test scores”.

This is an example of actuarial sloppiness and is of course quite wrong.

There are at least three clear reasons why the model could still be dangerously specified and inferior to a more coherently structured model with worse “test scores”.

The least concerning of the three is probably interpolation. We rarely have a complete set of calibration inputs. We might have equity volatility at 1 year, 3 year, 5 year and an assumed long-term point of 30 years as calibration inputs to our model. We will be using the model outputs for many other points and just because we confirmed that the output results are consistent with the calibration inputs says nothing about whether the 2 year or 10 year volatility are appropriate.

The second reason is related – extrapolation. We may well be using model outputs beyond the 30 year point for which we have a specific calibration. A second example would be the volatility skew implied by the model even if none were specified – a more subtle form of extrapolation

A typical counter to these first two concerns is to use a more comprehensive set of calibration tests. Consider the smoothness of the volatility surface and ensure that extrapolations beyond the last calibration point are sensible. Good ideas both, but already we are veering away from a simplified calibration test score world and introducing judgment (a good thing!) into the evaluation.

There are limits to the “expanded test” solution. A truly comprehensive set of tests might well be impossibly large if not infinite with increasing cost to this brute force approach.

The third is a function of how the ESG is used. Most likely, the model is being used to value a complex guarantee or exotic derivative with a set of pay-offs based on the results of the ESG. Two ESGs could have the same calibration test results, even calculating similar at-the-money option values but value path-dependent or otherwise more exotic or products very differently due to different serial correlations or untested higher moments.

It is unlikely that a far out-of-the-money binary option was part of the calibration inputs and tests. If it were, it is certain that another instrument with information to add from a complete market was excluded. The set of calibration inputs and tests can never be exhaustive.

It turns out there is an easier way to decreasing the risk that the interpolated and extrapolated financial outcomes of using the ESG are nonsense: Start with a coherent model structure for the ESG. By using a logical underlying model incorporating drivers and interactions that reflect what we know of the market, we bring much of what we would need to add in that enormous set of calibration tests into the model and increase the likelihood of usable ESG results.

Open mortality data

The Continuous Statistical Investment Committee of the Actuarial Society does fabulous work at gathering industry data and analysing it for broad use and consumption by actuaries and others.

I can only begin to imagine the data horrors of dealing with multiple insurers, multiple sources, multiple different data problems. The analysis they do is critically useful and, in technical terms, helluva interesting. I enjoyed the presentation at both the Cape Town and Johannesburg #LACseminar2013 just because there is such a rich data set and the analysis is fascinating.

I do hope they agree to my suggestion to put the entire, cleaned, anonymised data set available on the web. Different parties will want to analyse the data in different ways; there is simply no way the CSI Committee can perform every analysis and every piece of investigation that everyone might want. Making the data publicly available gives actuaries, students, academics and more the ability to perform their own analysis. And at basically no cost.

The other, slightly more defensive reason, is that mistakes do happen from time to time. I’m very aware of the topical R-R paper that was based on flawed analysis of underlying data. Mistakes happen all the time, and allowing anyone who wants to have access to the data to repeat or disprove calculations and analysis only makes the results more robust.

So, here’s hoping for open access mortality investigation data for all! And here’s thanking the CSI committee (past and current) for everything they have already done.

The importance of verification

It’s amazing to test this yourself. Hold down the button of a garage door opener and try to use your vehicle lock / unlock button. It doesn’t work. Simple as that. The signal from the garden variety garage door opener blocks the signal of most (all?) vehicle remotes.

Smart criminals are increasingly using this to stop owners locking their doors and stealing goodies from cars.  This seems to be getting worse in Cape Town CBD at the moment.

So why I am talking about verification?

Well, to prevent the problem requires either a high tech revamp of the vehicle remote industry, or for owners to actually observe or check that their doors are in fact locked. Listening for the clack, watching the hazard lights flash or even physically testing that the door is locked.

These methods work perfectly and are only the tiniest bit of extra effort.

Relying on models to simply give the right answer without checking that it makes sense is likely to get your credibility burgled. Form high level initial expectations, hypothesize plausible outcomes, check sensitivity to assumptions, reproduce initial results and run step-wise through individual changes, get a second opinion and fresh eyes.

This all takes a little time in the short term and saves everything in the long term.

How not to calibrate a model

Any model is a simplification of reality. If it isn’t, then it isn’t a model as rather is the reality.


Any simplified model I can imagine will also therefore not match reality exactly. The closer the model gets to the real world in more scenarios, the better it is.

Not all model parameters are created equal

Part of the approach to getting a model to match reality as closely as possible is calibration. Models will typically have a range of parameters. Some will be well-established and can be set confidently without much debate. Others will have a range of reasonable or possible values based on empirical research or theory. Yet others will be relatively arbitrary or unobservable.

We don’t have to guess these values, even for the unobservable parameters. Through the process of calibration, the outputs of our model can be matched as closely as possible to actual historical values by changing the input parameters. The more certain we are of the parameters a priori the less we vary the parameters to calibrate the model. The parameters with most uncertainty are free to move as much as possible to fit the desired outputs.

During this process, the more structure or relationships that can be specified the better. The danger is that with relatively few data points (typically) and relatively many parameters (again typically) there will be multiple parameter sets that fit the data with possibly only very limited difference in “goodness of fit” for the results. The more information we add to the calibration process (additional raw data, more narrowly constrained parameters based on other research, tighter relationships between parameters) the more likely we are to derive a useful, sensible model that not only fits out calibration data well but also will be useful for predictions of the future or different decisions.

How not to calibrate a model

Scientific American has a naive article outlining “why economic models are always wrong”. I have two major problems with the story: Continue reading “How not to calibrate a model”

Multi-tasking lowers productivity

I have to multi-task. I am the bottle-neck for too many problems already and my team needs input from me before they can continue in many areas.

But I know it doesn’t make me efficient. Switching between tasks takes time. You forget important details, and struggle to get the depth of understanding and focus required for complex issues. The cross-pollination of ideas and solutions doesn’t come close to making up for these drawbacks.

From the research:

Descriptive evidence suggests that judges who keep fewer trials active and wait to close the open ones before starting new ones, dispose more rapidly of a larger number of cases per unit of time. In this way, their backlog remains low even though they receive the same workload as other judges who juggle more trials at any given time.

Did you read those magic words? “…backlog remains low…” I don’t know anyone who doesn’t wish for the luxury of a shorter backlog of work.

The paper itself is fairly complex, analysing theoretical models of human task scheduling.  You should probably add it to your pile of things to read in the middle of other work.