I’ve recently completed my first real DataStage project and took a chance to get certified while all the stuff is still “fresh”. Certification itself is quite complex and I didn’t use most of the tricks depicted in questions up until the moment when one of the jobs had to process a quarter billion of rows in a reasonable timeframe. From that point on I learned quite a lot about partitioning, balancing, debugging and choosing right stages to do the job (who would’ve thought that RemoveDuplicates is waaaaay slower than Sort (with Remove Duplicates option) — why put RD stage in at all?) Anyhow, now I’m also an IBM Certified Solution Developer — Infoshere DataStage v8.5 )
So my current ETL tools breakdown goes smth like (not counting PoC and likes):
- Oracle Data Integrator — 3 projects
- Pentaho Data Integrator — 2 projects
- IBM InfoSphere DataStage — 1 project
And that’s my current preference list as well. I love ODI’s flexibility (it’s actually very simple once you get it how it works and it’s extremely configurable), ELT approach (I’d rather be tuning my DBMS than DBMS and a separate ETL engine). PDI is very open and quite user-friendly (compared to DS, for example) and it’s easier to understand & debug than ODI. PDI community edition is enough for most small data sized integration projects and enterprise version is very affordable. Datastage is terrifically well-suited for big data volume tasks and parallel processing, but is quite an overkill in small projects.
It’s interesting that although I did quite a bit of DWH model design I written have just a few posts on this topic. But every time I think about writing out some advice — I think that the best advice is to just go read the books. And if you still have questions — reread them ) I’m reread Kimball’s books a few times already and every time gives you an “ah, that’s what they meant” moment based on your recent experience.
Anyhow, my last couple major DWH projects were for government agencies and I packed a number of simple but effective modeling tips exactly for them. Hopefully I’ll write them out in nearest future. Just need a free weekend or two.