Category Archives: regression

Presentation to ACAL on GI Pricing

I gave a presentation on a holistic approach to ratemaking using predictive models yesterday to the Lebanese Insurance Association (ACAL, the acronym for the association in French). Over a hundred people attended, and there certainly seemed to be interest in the topic.

A common response though was that Lebanon isn’t yet ready for that, because rates are so low and nobody is prepared to change their approach. I accept that changing the “way things are done” in a fundamental way takes time and courage, but I expect that some players will start collecting the data, doing the analysis and improving their pricing in the next few years. By 2013, the market here will not be the same. The advantages across general insurance, banking, sales and cross-selling are simply too great. The techniques available are fantastic and can be implemented quite easily.

I’ve given the official press release below, and presentation ACAL GI Pricing 2008 (pdf version) is available under Resources on this site.

Insurance companies can generate a competitive advantage through accurate ratemaking, systematic risk-adjusted pricing, and careful analysis of policyholder price sensitivity at renewal dates. Single variable techniques can provide valuable insights into risk factors, but do not perform well in the presence of multiple drivers of risk.

Generalised Linear Modelling (GLM) is the preferred approach for robust, multivariate analysis of claim severity and frequency modelling. GLM can model several rating factors simultaneously, including interactions between different rating factors on risk. It is used extensively in the UK, the US and other highly competitive and developed insurance markets.

Judgement and experience are required when assessing different models and interpreting the diagnostic tests used to ensure accurate and robust results. A good model can make dramatic improvements in the separation of high and low risk policyholders.

These advanced approaches all have increased data requirements. Companies looking to reap the rewards of improved ratemaking will need to develop the databases and systems to store exposure, claims and rating factor data. There is a range of software available to perform the statistical analysis, from expensive purpose-built systems to freely available, open-source statistical platforms.

Successful implementation of an advanced rating system depends on commitment of key staff to the project and the inclusion of marketing, underwriting, legal, IT and actuarial skills in the project team. Market characteristics and reluctance to change are constraints to the adoption of advanced techniques. These have been faced and overcome in many other markets. It is only a matter of time before insurers must use these techniques even to maintain their competitive position.  Early movers will enjoy an improvement in their competitive position, market share and profitability.

Don’t use Altman’s Z-score for managing a turnaround

I attended workshop presented by the famous credit analyst and model builder, Professor Edward Altman. He is probably most famous for the invention of the seemingly immortal Z score, which is still in use 40 years after its creation in 1968.

During the workshop, Professor Altman recounted a story about how a company managed themselves out of near-failure using his Z score. I’m not denying the facts of the story, and I’m not even saying that use of the Z-score at this company (GTI Corporation) didn’t help the turnaround. I am proposing that using Altman’s Z-score to manage turnaround would be ill-advised.

Download the full Viewpoint below.

Don\'t use the Z-score to manage a turnaround

Directors’ Dealings – Information, Noise and the role of Randomness

I was reading an article about the directors of Imperial and Steinhoff purchasing shares in their respective companies. Steinhoff directors made the news when the CEO originally used Single Stock Futures to gear his exposure to a significant position in the company. (This position has since been converted into a direct position with about the same total exposure.)

This raises the question of whether directors’ dealings provide information as to the future performance of the company.

Information content of directors’ sales

Directors may sell shares for many reasons unrelated to perceived valuation differences in the share price. A director may sell to raise cash for other purposes, to decrease exposure to a single company, in a single entitty, in a single country where salary and bonuses are also tied to that company’s fortunes.

Information content of directors’ purchases

On the other hand, directors buying shares in their own companies is usually a good sign. Putting management’s general optimism about their own businesses aside, if they believe that their business is significantly undervalued, the incentive to buy more shares make this a worthwhile sign. This is especially true given the reasons mentioned above to reduce exposure to your own company.

Empirical evidence

I don’t have the references at hand, but I’m fairly confident that this has been supported to some extent by empriical “event” studies that analyse the relationship between Total Shareholder Return and purchases and sales by known company insiders.

The other side

Steinhoff’s performance recently has been quite horrible. A mixture of trading conditions and global economic changes (exchange rates amongst others) have meant that the share expecting operating performance of the business had deteriorated. The share has been accordingly downrated – see the chart below (the purchase was in July 2007).

Steinhoff Share Price Peformance 2007

So in this case, a heavily geared vote of confidence in the company by knowledgeable managers did not provide useful information (over the 6 or so months since that point) to outside investors. But before we throw the baby out, consider the following points:

  • This is one example. We (hopefully by now) don’t expect to hit gold with every turn of the wheel.
  • 6 months is a short time-period for committed investors (not traders) who may be happy to stick by the company for 5 or 10 or 50 years. If the prospects of the company are good, they should be good for a longer period than 6 months.
  • Very much related to this point is the idea that the swings of the global economy and exchange rates and the day-to-day vagaries of consumer confidence, inflation and interest rates are difficult to predict. We should not judge a decision at a point based solely on the outcomes. We must consider all other possible outcomes as well, weighted by the probability of their occurrence, and then we can fairly assess whether the decision was appropriate or not.

So what now?

Am I going to invest in Steinhoff? Well, no, not yet, not until I have actually done some proper research into the fundamentals of the company. And also not until I have understood the reasons for the decline in price over the last year properly. If the market thinks they are worth less, I had better know why the market thinks so before I disagree too strongly.

Having said that, I pay careful attention to knowledgeable insiders when they put their money where there collective mouths are and vote with their personal wealth and risk appetites that a company is a good bet.

Additional Analysis of SEOmoz web popularity data provide some great resources on search engine optimisation (“SEO”). Recently, they performed a really interesting analysis comparing actual site traffic for 25 sites that volunteered their data against indicators from a range of competitive intelligence metrics from sources such as Google PageRank, Technorati Rank, Alexa Rank and’s very own Page Strength Tool. The stated goals of the project is described in this quote from their page:

This project’s primary objective is to determine the relative levels of accuracy for external metrics (from sites like Technorati, Alexa, Compete, etc.) in comparison to actual visitor traffic data provided by analytics programs. 25 unique sites, all in the search & website marketing niche, generously contributed data to this project. Through the statistics provided, we can also get a closer look at how the blog ecosphere in the search marketing space receives and sends traffic

You can find the commentary on their updated analysis and also the original article (updated too, I understand).

Now, I’m not yet an expert on SEO, but I do know a few things about data analysis. Whereas their results indicate that none of the measures are particularly useful, I have three points to add:

1 Significance of correlation coefficients

A correlation coefficient does not need to be 0.9 or 0.95 to be significant as mentioned:

Technorati links is actually an almost usable option at this point, though any scientific analysis would tell you that correlations below 90-95% shouldn’t be used.

Roughly speaking, correlation coefficients greater than about 0.7 or 70% explain approximately half the variability in the observed variable (actual page visits). Whether or not this is “significant” depends on the amount of data used to measure the correlation. There are some very specific tests for measures of significance for correlation coefficients – I have summarised the results of one of the standard tests here:

SEOmoz data Correlation Significance Table

Beyond the technical statistical tests though, I would imagine that there is a great deal of value in estimating a large part of the practical popularity of a website (and presumably page visits is a sensible measure of this) through freely available “competitive intelligence metrics”. On the other hand, if you are looking for a near-exact replica of actual visits, then a much higher correlation coefficient is required.

2 Extending analysis to multiple regression rather than single correlations

OK, this does take the analysis beyond the original stated goal, but it is interesting to see how good a model of actual site popularity we can develop based on freely available “competitive intelligence metrics”. But first, it is useful to consider the correlation matrix between all variables (the “dependent variable” and all independent variables). In an ideal regression model, the independent variables will be uncorrelated with each other. On the other hand, if these metrics are any good, we would expect them to be strongly correlated with each other.
SEOmoz data Correlation Matrix
As can be seen from the table above, there are several strong correlations between the independent variables. This can lead to problems with “multicollinearity” for multiple regression technqiues, but since I am trying to keep this post non-technical, I’ll leave that alone for now. It is also interesting that while all the large (loosely defined here as greater than 70% or less than -70%) correlations are positive, there are many negative correlations as well. Thus, some measures appear to be using different information or approaches to provide the metrics. Most interesting to me is that TR Rank and TR Link have a correlation coefficient of -50%. This will be a hint to our multiple regression results…
I decided to use only very basic tools for the analysis so interested readers can perform the same analysis on their own with only MS Excel (generally a fairly weak statistics platform even with the Data Analysis add-in activated). My aim was to find a model that explained more of the Average Visits than Technorati Links by combining several variables together. I had to exclyde Compete Rank and Ranking Rank due to the limitations of Excel’s regression tools. I would measure “good” models by having a high adjusted R-squared, and significant and sensible estimates for individual variables as well. The results of a “good” model (although not necessarily the best since I did fairly quick and dirty model selection) are given below:

SEOmoz data Multiple Regression Results

SEOmoz data Multiple Regression Results Summary

The model has a “Multiple R” (which is intuitively analogous to the normal Pearson correlation coefficient) of 89%, and the model explains 80% of the variability in Average Visits. Other measures of goodness of fit include a high Adjusted R-squared (relative to other models fitted) of 71%, a F-statistic for overall model significance of 9.5 which gives a significance level or p-value of 0.00008 and low p-values for most independent variables included in the model. The intercept itself is not signfiicant, but we leave it in to improve the overall fit of the model. Similarly, while the significance level for Alexa Page Views is relatively high at 17%, it does add to the overall model in terms of fitting the data well.

SEOmoz data Multiple Regression fitted model

Again, very interestingly but not surprising by now, many of the coefficients are negative. This implies that, at least once adjusting for the other variables, these measures are associated with lower rather than higher Average Visits. This suggests more analysis and more data is needed to understand the dynamics here properly!

3 Quality and quantity of data
This leads me to my final comment. 25 Websites, while great to have even this much data, is not really anywhere close enough data to analyse this problem. This isn’t because of the small size of 25 sites in relation to the total available websites on the ‘net, but rather to do with the spread of sites across the different types of websites and the potential to fit the model too closely to the exact data provided rather than to some underlying reality. Again, this is a difficult area to discuss correctly and thoroughly without becoming very technical so I’ll leave that well alone too.

Final comments

This analysis and presentation of results is very lite for something this interesting. There is an enormous amount more that could be done with time, energy, more data, and, for my part, a better understanding of how each of these competitive intelligence metrics are intended to work. I’d welcome any comments on what analysis would be desired (time-series? Non-linear models? More detailed regression? Rank correlation?) and whether there is any chance of getting more data. I’d be very happy to dig deeper and post the results here and/or directly on