Twenty Third Floor

Figuring out the future and the now

complexity, data analysis, modelling, operational risk

Do Data Lakes hide Loch Ness Monsters?

October 22, 2017

David Kirk

I had a discussion with a client recently about the virtues of ensuring data written into a data warehouse is rock solid and understood and well defined.

My training and experience has given me high confidence that this is the right way forward for typical actuarial data.Â Here I’m talking in force policy data files, movements, transactions, and so on.Â This is really well structured data that will be used many times by different people and can easily be processed once, “on write”, stored in the data warehouse to be reliably and simply retrieved whenever necessary.

The particular issue was effectively about a single transaction, a single claim, being recorded as two separate items because of administration system limitations that required the claim to be processed as two equal amounts rather than a single claim. I can understand wanting to possiblyÂ also store the two separate transactions for audit trail purposes, but in terms of the claim for movements and analysis runs and checking against policy rules, it needs to reflect the product, business, actuarial model and customer reality of a single claim.

If the data is permanently stored as two separate, equal transactions, the end consumer of that data will always need to know to combine them, that they’re not duplicates (well they are, but they’re not – see how confusing it can be?). It really is standard practice for data warehouses to have “schema on write” in place. This talks to the structure of the data, but also the cleaning and transformation, application of business and validation rules to the data to fit the data definitions exactly.

And then I heard about Data Lakes. At first, these same a little like a new buzzword for data warehouse. But in fact it is quite different. The idea behind a data like is to store large amounts of unstructured data, as it comes, and figure out what to do with it later.

This is “schema on read”.

If you can imagine an actual lake, full of different things, sand and mud and fish and seaweed, but also boats and piers and old anchors and cool water and warm water and maybe the occasional Loch Ness Monster. Some of it may be structured, most of it not, and you don’t need to define what you will store ahead of time. You just throw it all in the data lake and “figure out what to do with it later”

The value of a data lake is that it is flexible. Nothing is lost from the original data. If a brand new purpose for the data arises that wasn’t even imagined when the data was originally stored, no avenues have been closed off. This is not true for a data warehouse.

However, for the types of data used for a simple actuarial valuation, locking down the exact data to be used to ensure results are consistent, that all valid policies are valued, that the analysis of experience and surplus work correctly, a rigid, reliable data source is required.

If a data lake is to be used, then another store of the highly structured, cleaned and transformed and validated data will be required to support the time-sensitive standard valuation processes. Whether this isÂ data warehouse feeding off the data lake, or just a simple data base, I suppose if up for debate. But a data lake on its own holds too many monsters to be used raw.

About David Kirk

Featured Posts

The “Indemnity Trap”: Why Outdated Legal Models are Deferring the Promise of Parametric Insurance
by David Kirk
Parametric insurance is often marketed as the “clean” alternative to traditional risk transfer. The pitch is compelling: if a hurricane hits a specific GPS coordinate… Read more: The “Indemnity Trap”: Why Outdated Legal Models are Deferring the Promise of Parametric Insurance
The Complexities of Comparing SCR Cover Ratios Across Insurers
by David Kirk
In the insurance industry, we often use Solvency Capital Requirement (SCR) cover ratios as a key metric for comparing insurers’ financial strength. While these ratios… Read more: The Complexities of Comparing SCR Cover Ratios Across Insurers
One answer could be pet insurance
by David Kirk
I firmly believe major demographics shifts are going to have massive social, political, economic, financial market and commercial impacts in the coming decades. The balance… Read more: One answer could be pet insurance
A piece of the failure puzzle – decreasing insurer failure rates through Skilled Person Reviews
by David Kirk
Every failure hits policyholders’ savings or cover, impact their lives and their livelihoods. They destroys shareholder value and decrease confidence in the entire financial sector.… Read more: A piece of the failure puzzle – decreasing insurer failure rates through Skilled Person Reviews
How and why insurers fail
by David Kirk
I’ve been updating my presentation from 2021 on “How and Why Insurers Fail”. I now estimate the annual failure rate (or at least getting into… Read more: How and why insurers fail
The hidden dangers of opinion shopping and trusting good news.
by David Kirk
I had a recent discussion on the value of getting a second opinion. There’s the old line that “if you want five opinions, ask three… Read more: The hidden dangers of opinion shopping and trusting good news.
40,000
by David Kirk
40,000. That’s the ballpark figure I usually work with as the minimum number of micro insurance policies required for scale. The expenses of running even… Read more: 40,000

Twenty Third Floor

Figuring out the future and the now

Do Data Lakes hide Loch Ness Monsters?

Leave a Reply Cancel reply

About David Kirk

Featured Posts

Categories