More time I spend doing BI\DWH projects, I become more aware that such projects a totally a ‘craft‘, as my professor once said. And this ‘craft’ has many different aspects in it, each having a number of of very good texts. telling you all you need to know to avoid simple mistakes and deliver predictable results. But I encounter a very small number people who read even half of these books. Maybe it’s the ease of ‘bicycle invention’, maybe it’s boredom of reading about others mistakes, instead of earning your own scars. So it’s another ‘book-list’ of texts that make up a competent BI\DWH professional.
I’m trying to split it up into section of interest.
DWH Classic texts
There is no way you can do DWH modeling or ETL design, or even reports developing without reading at least some of Kimball Classics (Dimensional Modeling, ETL Toolkit, Lifecycle Toolkit). And if you’re into this for more than one project — you should start memorizing Lifecycle by heart.
You should read Bill Inmon as well to be aware of 2 different approaches to modeling. Also see Data Modeling section just below.
Data Modeling.
These books are not DWH-specific and most modeling tricks don’t apply directly, but you should be aware of typical problems and solutions, since this allows fast source system model examination and profiling — you can recognize modeling techniques used in developing and potential caveats in data.
There are numerous data modeling books out there, I’d recommend reading Hoberman and Witt&Simsion.
And I insist on reading Data Model Resource Books. It has 3 volumes and I suggest starting with 3rd one and then picking your business area of interest (HealthCare, Telco, Insurance — there’s a lot to choose from) and looking the model in first two volumes.
Achieving DWH performance
If you’re interested in getting it all faster, take a look at Mastering DataWarehouse Aggregates. And there’s a deep insight out there in Relational Database Index Design and the Optimizers — read this one only if you’re really into performance )
Data Quality
It’s a very important, yet most forgotten topic. After reading ETL Toolkit and Lifecycle parts on data quality, I suggest following with one of works by Larry English. If you’re very serious about separate DQ project, pick up Executing Data Quality Projects.
Data Visualization & Report Design
Data visualization is a most expected BI-revolution that won’t come in a way vendors think about it (another feature pack or product). It will come by evolution, just not of BI products, but of us, BI professionals and data analysts. As we will get more acquainted with analysis techniques and get more experience with well designed visual aids (as opposite to Flash-based eye candy), we’ll shift our demands towards more analytical oriented tools and vendors will have to provide them )
I’d recommend anyone to skim through Head’s First Data Analysis, since it’s a good introductory book on how gain some insights from data.
Anyone designing any dashboard should read Stephen Few’s books. It’s an absolute must. Stop Pie-Charts invasion!
If you’re up to thinking about some new ways of representing information or gaining a deeper understanding of how things work, read Tufte’s works. And then follow up with Collin Ware deep texts on how our minds really work while processing visual information.
Product documentation
And, of course, you should read all the documentation provided by vendors of your tools and follow up most prominent blogs (there are ones on almos every tool out there) and forums. It’s a shame to see people inventing built-in functionality or not knowing that’s a screwdriver they’re are trying to slam a nail with. I find myself doing this with TM1 recently, so I’m back to reading manuals
I’ll update this post when I’ll encounter a good text on MDM or more good books on above mentioned subjects.
Any suggestions?
Pingback: Applied dimensionality - New recruit to my ETL toolbox