A check-list for using Microsoft SQL Server as a reporting database. This is a ‘be aware’ list for myself in first place, since I forget some details as time passes.
Set up simple recovery mode for database to minimize performance issues with transaction log records. Usually you can use daily backups with DWH, since data is loaded daily. In more complex situations you can use ETL-based ‘checkpoints’, but in general, daily backups are more than enough.
Use decent I\O systems and separate data and log I\O channels. This is obvious, just reiterating.
RAID 1+0 seems to be preferable choice of disk configuration for DWH loads. See this entry, this one or test it yourself )
Use SQL Server 2008r2 Enterprise if you have a choice. 2008 adds table\index compression (which is great) and get’s partitioning to work way better in queries. And I’ve 20% performance speed-up against 2005 on the same query set (with same execution plans and that surprised me).
Set Auto shrink off, it creates unpredictable I\O strain. Shrink database daily, just after backup in the same maintenance task.
Use dentity columns as table primary keys. And using this column as clustered index will help I\O in most cases.
Again, obvious, but I’ve seen this violated more than once. Use foreign keys for fact-dimension table reference. This will help optimizer a lot.
You can disable these keys to speed up inserts, but in 90% cases it’s not needed.
Always use partitioned tables (even fake, with 1 partition) to:
SQL Server 2008 introduces table compression which helps to reduce I\O (but increases CPU load), which is very good for reporting tasks. Use it for fact tables.
More benefits of partition switching for loading:
You can implement switching by creating another temporary partition, which I like to call ‘last load’. So you switch:
After that you have ‘last_load’ to:
Another 2008-only feature is called filtered indexes. These indexes can be essentially smaller than common ones if you know for sure what region of data you’ll be accessing. And this turns out to be very efficient for SQL code you’re using for different calculations. Since I mostly work with Oracle Data Integrator as an ETL tool and it’s an SQL-generating machine: I get up to 10 times calculation speed up by building small, calculation-specific filtered indexes.
All inserts should be done with TABLOCK
hint, which speeds up insert by acquiring a full table lock during the whole transaction. Since in DWH main inserts are performed by ETL there should be no conflicts. TABLOCK also speeds up logging during bcp.
Really a vast topic, but to sum it up ) :
Be very-very careful, ‘your mileage may vary’:
See Books session below, skip through at least one of them before starting it out wild.
As with ETL calculations — filtered indexes may be used for tuning specific queries.
Indexes can be compressed in the same way as tables (which is obvious given that a table is a clustered index) and this speeds queries by reducing IO
I’d recommend following: