Applied Dimensionality

Microsoft SQL Server reporting database configuration

Posted at — Sep 23, 2010
Microsoft SQL Server reporting database configuration

A check-list for using Microsoft SQL Server as a reporting database. This is a ‘be aware’ list for myself in first place, since I forget some details as time passes.

Overall database settings

Use simple recovery mode<

Set up simple recovery mode for database to minimize performance issues with transaction log records. Usually you can use daily backups with DWH, since data is loaded daily. In more complex situations you can use ETL-based ‘checkpoints’, but in general, daily backups are more than enough.

Separate log and data io channels

Use decent I\O systems and separate data and log I\O channels. This is obvious, just reiterating.

RAID 1+0 seems to be preferable choice of disk configuration for DWH loads. See this entrythis one or test it yourself )

2008 r2

Use SQL Server 2008r2 Enterprise if you have a choice. 2008 adds table\index compression (which is great) and get’s partitioning to work way better in queries. And I’ve  20% performance speed-up against 2005 on the same query set (with same execution plans and that surprised me).

Auto shrink Off

Set Auto shrink off, it creates unpredictable I\O strain. Shrink database daily, just after backup in the same maintenance task.

Table configuration

identity columns as primary keys<

Use dentity columns as table primary keys. And using this column as clustered index will help I\O in most cases.

Set up foreign keys

Again, obvious, but I’ve seen this violated more than once. Use foreign keys for fact-dimension table reference. This will help optimizer a lot.

You can disable these keys to speed up inserts, but in 90% cases it’s not needed.

Table partitioning

Always use partitioned tables (even fake, with 1 partition) to:

Table compression in 2008

SQL Server 2008 introduces table compression which helps to reduce I\O (but increases CPU load), which is very good for reporting tasks. Use it for fact tables.

ETL

Partition switching as default load strategy

More benefits of partition switching for loading:

You can implement switching by creating another temporary partition, which I like to call ‘last load’. So you switch:

After that you have ‘last_load’ to:

Filtered indexes for calculations

Another 2008-only feature is called filtered indexes. These indexes can be essentially smaller than common ones if you know for sure what region of data you’ll be accessing. And this turns out to be very efficient for SQL code you’re using for different calculations. Since I mostly work with Oracle Data Integrator as an ETL tool and it’s an SQL-generating machine: I get up to 10 times calculation speed up by building small, calculation-specific filtered indexes.

Tablock hint

All inserts should be done with TABLOCK hint, which speeds up insert by acquiring a full table lock during the whole transaction. Since in DWH main inserts are performed by ETL there should be no conflicts. TABLOCK also speeds up logging during bcp.

Query tuning

Indexes

Really a vast topic, but to sum it up ) :

Be very-very careful, ‘your mileage may vary’:

See Books session below, skip through at least one of them before starting it out wild.

Filtered indexes

As with ETL calculations — filtered indexes may be used for tuning specific queries.

Index compression

Indexes can be compressed in the same way as tables (which is obvious given that a table is a clustered index) and this speeds queries by reducing IO

Books to read

I’d recommend following:

comments powered by Disqus