I’m a fan of testing in each DWH project, because it allows to:
Checking whether we could carry all required data into DWH without ‘spilling’ it out.
Common things to check:
Common mistakes found:
Checking dimension mapping for a selected dimension. This extends previous test by including dimension totals for checking.
So if we want to check whether ‘products’ were mapped correctly — we compare totals(sums) by time, store, etc, listing all dimensions except products.
If counts(sums) differ in source\DWH — we know that products were mapped incorrectly. Moreover we know the specific data subset containing error, which helps a lot as well.
Checking DWH calculations. If we’re doing some data transformation\calculating something in DWH — it should be tested as well.
2 approaches to testing:
These two approaches should be combined.
Checking some heuristic expectations, based on previously loaded data.
If you’re using partitioned loads ‘last_load’ partition can be used for such testing.
It’s best when tests are written not by procedure developer himself. You can always apply cross-checking (you write tests for my procedures, I write for yours).
Pay attention to Chapter 4 of ETL Toolkit – this post is just a simple list of typical tests, the methodology is described there.