Pages
Usefull Links
Cognos:
IBM DeveloperWorks Forum Cognoise Community
OlapForums, main pool of TM1 expertise
Cognos forum on Tek-Tips
OLAP&DWH on sql.ru -- here you can get some advice in Russian
Hyperion:
Oracle Essbase Forum
network45 board
Hyped Forums
Categories
- analyst (2)
- BI&CPM (71)
- blogging (15)
- coding (11)
- cognos (129)
- bi (44)
- controller (2)
- ep cookbook (46)
- contributor-only (14)
- currency (1)
- large dimensions (4)
- tips (17)
- tm1 (13)
- ep (16)
- essbase (6)
- etl (4)
- ibm (7)
- open source (8)
- software (4)
- confluence (3)
- Uncategorized (1)
Comments
Powered by DisqusRecent Comments
Tags
access manager analyst aso asp bi bo break-back certification closed_cubes cognos cognos certification cognos sdk confluence contributor controller dlists enchancements ep ep 8.2 ep publish essbase event manager framework hyperion ibm j java jobservers m&a macros Microsoft odi olap open source oracle package pentaho performancepoint powerplay scripts sql tips tm1 transformer wordpress
ETL Testing
I’m a fan of testing in each DWH project, because it allows to:
- be sure that at least tested parts work
- change logic w\o retesting all results
Receiving “All OK” in the morning means that there’s time for a cup of coffee )
A list of approaches I use for testing ETL-procedures:
Checking whether we could carry all required data into DWH without ‘spilling’ it out.
Common things to check:
- row counts in source\DWH
- grand totals in source\DWH
Common mistakes found:
- precision errors
- missing dimension mappings
Checking dimension mapping for a selected dimension. This extends previous test by including dimension totals for checking.
So if we want to check whether ‘products’ were mapped correctly — we compare totals(sums) by time, store, etc, listing all dimensions except products.
If counts(sums) differ in source\DWH — we know that products were mapped incorrectly. Moreover we know the specific data subset containing error, which helps a lot as well.
Checking DWH calculations. If we’re doing some data transformation\calculating something in DWH — it should be tested as well.
2 approaches to testing:
- overall logic testing. For example, if we’re allocating HQ expenses to get regional reports it logical to expect the overall sum of expenses to stay the same )
- testing a specific data subset. We can select a single account and verify logic on it.
These two approaches should be combined.
Checking some heuristic expectations, based on previously loaded data.
1) There’s no way daily sales can jump 70% compared to quarter average
2) We’re usually getting about this number of rows from this source
3) If we’re reloading 3 months of data daily — we expect modest amount of changes in past days.
If you’re using partitioned loads ‘last_load’ partition can be used for such testing.
It’s best when tests are written not by procedure developer himself. You can always apply cross-checking (you write tests for my procedures, I write for yours).
Pay attention to Chapter 4 of ETL Toolkit — this post is just a simple list of typical tests, the methodology is described there.
Microsoft SQL Server reporting database configuration
A check-list for using Microsoft SQL Server as a reporting database. This is a ‘be aware’ list for myself in first place, since I forget some details as time passes.
Table of contents
Overall database settings
Use simple recovery mode
Set up simple recovery mode for database to minimize performance issues with transaction log records. Usually you can use daily backups with DWH, since data is loaded daily. In more complex situations you can use ETL-based ‘checkpoints’, but in general, daily backups are more than enough.
Separate log and data io channels
Use decent I\O systems and separate data and log I\O channels. This is obvious, just reiterating.
RAID 1+0 seems to be preferable choice of disk configuration for DWH loads. See this entry, this one or test it yourself )
2008 r2
Use SQL Server 2008r2 Enterprise if you have a choice. 2008 adds table\index compression (which is great) and get’s partitioning to work way better in queries. And I’ve 20% performance speed-up against 2005 on the same query set (with same execution plans and that surprised me).
Auto shrink Off
Set Auto shrink off, it kicks unpredictable I\O strain. Shrink database daily, just after backup in the same maintenance task.
Table configuration
identity columns as primary keys
Use identity columns as table primary keys. And using this column as clustered index will help I\O in most cases.
Set up foreign keys
Again, obvious, but I’ve seen this violated more than once. Use foreign keys for fact-dimension table reference. This will help optimizer a lot.
You can disable these keys to speed up inserts, but in 90% cases it’s not needed.
Table partitioning
Always use partitioned tables (even fake, with 1 partition) to:
- achieve partition elimination during query execution. Especially in Ms SQL 2008
- use partition switching for data load (check this article as well)
Table compression in 2008
SQL Server 2008 introduces table compression which helps to reduce I\O (but increases CPU load), which is very good for reporting tasks. Use it for fact tables.
ETL
partition switching as default load strategy
More benefits of partition switching for loading:
- while you’re loading data in ‘new_data’ partition, base table is accessible and consistent
- after you’ve prepared ‘new_data’ partition you can check it’s data for quality (planning to write about later)
You can implement switching by creating another temporary partition, which I like to call ‘last load’. So you switch:
- current -> last_load
- new_data -> current
in one transaction
After that you have ‘last_load’ to:
- switch back in emergency
- check data from current\previous load for some more quality checks
Filtered indexes for calculations
Another 2008-only feature is called filtered indexes. These indexes can be essentially smaller than common ones if you know for sure what region of data you’ll be accessing. And this turns out to be very efficient for SQL code you’re using for different calculations. Since I mostly work with Oracle Data Integrator as an ETL tool and it’s an SQL-generating machine: I get up to10 times calculation speed up by building small, calculation-specific filtered indexes.
Tablock hint
All inserts should be done with TABLOCK hint, which speeds up insert by acquiring a full table lock during the whole transaction. Since in DWH main inserts are performed by ETL there should be no conflicts. TABLOCK also speeds up logging during bcp.
Query tuning
Indexes
Really a vast topic, but to sum it up ) :
- use covering indexes. It seems better to have a ‘wide index’ than a number of indexes on all dimensions and to apply index intersection.
- columns in indexes should be ordered by selectivity
Be very-very careful, ‘your mileage may vary’:
- creating a huge number of large indexes may actually slow system down due to I\O overload
- you should try to create indexes that can help a number of reports\query instead of tuning a single request
See Books session below, skip through at least one of them before starting it out wild.
Filtered indexes
As with ETL calculations — filtered indexes may be used for tuning specific queries.
Index compression
Indexes can be compressed in the same way as tables (which is obvious given that a table is a clustered index) and this speeds queries by reducing IO
Books to read
I’d recommend following
And if you have time and will
Cognos PowerPlay Transformer Limitations
Just a bullet list of things that separate Cognos Transformer from being a capable OLAP-tool in modern market. Neither of these problems will be solved, I think, and this will lead to Transfomer’s death in near future (about 5 years, imho, given huge userbase). That’s a pity, since it’s a nice tool, absolutely brilliant 10 years ago. But as data sizes grow, Transformer just cannot scale. Hyperion did a nice job of inventing ASO option when it faced volume challenges, Cognos decided all was good enough.
List:
Given all of the above, powerplay is a nice single-user tool with friendly modeling gui or an OLAP-server in ‘small-data’ enviroment. As a single user tool it will be replaced by PowerPivot and as an OLAP-server it will be replaced by TM1 (which I’m not happy with — see following posts) or Microsoft Analysis Services.
Cognos Powerplay zero suppression
Extracting from a discussion from sql.ru forum:
PowerPlay performs very bad when it comes to large and sparse (with big number of nulls and zeros) cross-tabs.
Modern OLAP engines are specially optimized forNonEmpty MDX clause, which is default report generation option.
PowerPlay was designed for fast cross-tab rendering and maximum amount of calculations on client machine. Nobody changed that, although it was promised. And nowadays dimensions with hundreds of thousands elements are very common and millions are not at all unique.
So if you now want to use Transformer — keep in mind that you should create most used reports in Report Studio, using filter clause to push suppression back to server.
Cognos BI temporary files (UDA)
Just a link to developerWorks thread where Phil_W gives an excellent explanation. Note this guy, all his post are truly brilliant: concise, informative and insightful. I use ProcessExplorer to find PIDs too.
And all the rules for Oracle BI temp files stay in place — I\O speed is crucial, use SSD )
Replacing strings in Oracle BI Publisher reports
Oracle BI publisher works awfully bad with Answers type of datasource in at least two points:
- column names are created by answers formulas instead of column names, so if you have long formulae, it’ll be shortened to some 128 chars. And if first 128 chars of formulas are identical, you’ll get only one of them in publisher
- if columns return null in answers – they will not be available at all
- if columns return null in answers – they will not be available at all
We ended up using logical SQL generated in answers as a source for publisher reports, which works like charm up until you start changing repository or changing underlying data you’re using for filters.
- repository changes can be handled by adding aliases to columns being renamed
- data changes are way more difficult to tackle
For example when you change city name from ‘NY’ to ‘New York’ in geography dimension (this should be solved by BI 11 id\description feature) you have to change all formulas and filters using this value.
So I wrote a simple Python3 script, doing exactly that — replacing all needed strings in publisher xdo files. If there’s any need for that — mail me, I’ll send it over.
Configuring Ms SQL databases for Cognos Enterprise Planning
Just a quick side note — I usually use simple recovery mode in Ms SQL installations because:
- application databases become inconsistent during administration tasks so having a backup of database in a middle of GTP crash doesn’t give you anything, you’ll have to roll back to GTP start time. So backing up before and after administrative task is better.
- publish databases are created via bulk load, which gets rather slow with full logging as log files grow very fast.
In ideal situations, application databases should be in Full Recovery Mode not to lose contributor input and publish databases should be in Simple Recovery mode. But in most cases Simple Recovery Mode and daily backups suffice.
Because otherwise — and it’s oh so common situations — I arrive at the sites with planning databases occupying hundreds of Gb (no, really, I’ve seen another such case just yesterday).
Here’s a simple script that turns all planning databases (selected by cognos_ep prefix) into Simple Recovery Mode and then shrinks them.
USE master; DECLARE DBNames CURSOR FOR SELECT NAME FROM sysdatabases where name like 'cognos_ep%' OPEN DBNames DECLARE @Name varchar(50) FETCH NEXT FROM DBNames INTO @Name WHILE (@@FETCH_STATUS <> -1) BEGIN print @Name exec ('ALTER DATABASE '+ @Name + ' SET AUTO_SHRINK ON WITH NO_WAIT') exec('ALTER DATABASE ' + @Name + ' SET RECOVERY SIMPLE WITH NO_WAIT') exec('ALTER DATABASE ' + @Name + ' SET AUTO_SHRINK ON') exec('ALTER DATABASE ' + @Name +' SET RECOVERY SIMPLE') DBCC SHRINKDATABASE( @Name , 0) FETCH NEXT FROM DBNames INTO @Name END CLOSE DBNames DEALLOCATE DBNames
And a bottom line: Worry about backups! Hire a DBA (setting up backups is a one day job — pay part-time)
Pivot Tables and temp files in Oracle BI EE
Pivot tables
Pivot tables are by far most used data view in Oracle BI Answers, since they provide ‘multidimensional’ analysis capabilities, aka dimension drag&drop, filtering and some basic calculations.
From server-side view — this component provides worst performance, since all future ‘cube’ data has to loaded into BI server, aggregations and formulas calculated and transformed to internal format.
So what happens when user clicks a pivot for 200,000 cells cube?
1) SQL-query is fired into database
2) All resulting rows are loaded into temporary directory of BI server
3) If there are calculations to be performed — initial temp file is read, calculated and new temp written out
There are two important instanceconfig.xml parameters for Pivot tables:
- CubeMaxPopulatedCells – The maximum number of populated cells in the Pivot Table. If this maximum is exceeded, the user receives an error when rendering the pivot table.
and
- CubeMaxRecords – The maximum number of records returned by a query for the Pivot Table engine to process. This roughly governs the maximum number of cells that can be populated in a pivot table (unpopulated cells in a sparse pivot table do not count). If this maximum is exceeded, the user receives an error when rendering the pivot table.
All instanceconfig.xml options can be found here.
Temp files.
Temp files can be rather huge, I’m currently seeing files about 1gb size for medium sized reports. That, surely, because we’re using MS SQL server and BI Server cannot “push” some calculations in database, but it’s nice to keep in mind the fact that temp files can occupy some 20-30 Gb.
Temp files are cleaned out when session logs out, so some of them stay there for whole day.
Location of temporary files is set up by WORK_DIRECTORY_PATHS variable in NQSConfig.ini and by SATEMP enviroment variable. See this post for instructions on how to move tmp folder.
Speeding things up.
Rather huge files are written to temporary directory, so the IO performance becomes an obvious bottleneck for process.
What can be done:
1) Use multiple tmp\ directories on different disks to balance the load
2) Use Ramdisks for tmp\ — I’ve tried this way and got a some fantastic 5x speedup on most IO bound reports. But it’s pretty hard to find free 30 Gb of RAM, so we’re using multiple disks for now
3) Use SSD for tmp — that’s what I plan to try next. I bought an SSD for my laptop about a year ago and my life has never been the same, so stay tuned )
Essbase ODBC configuration on Solaris
Using this blog as offline memory. Just ignore this post, if you’re not up to Essbase )
If you’re configuring DataDirect ODBC drivers on *nix system, be aware of following:
- There are 2 versions of odbc.ini file, one in 32bit driver folder, other in 64. 64 essbase uses 64 drivers.
- You should copy required odbc.ini file to essbasepath\bin. And you should create a symbolic link (ln -s) to odbc.ini file in essbase user home folder (/home/hyperion, for example)
- When creating a SQL data source rule file — ServerName corresponds to database server name, not essbase server name (type TNS name for oracle database)
Useful links:
Essbase SQL Interface Documentation
Metalink on search keywords ‘odbc.ini + essbase’ )
BI system metadata impact analysis and what BI system APIs are for
“Okay, this column in our BI metadata was supposed to be named Sales, not Sals.
But when we change it, what will happen to the reports that were created, using this name? And how many reports use this column anyway? Maybe we shouldn’t touch anything to keep things working…”
Ever been in such situation?
This question can be answered by opening each report and then looking at it’s contents ) A horrible task indeed.
Actually, this is exactly what BI system API are made for. Using them you should be able to automagically look into each report and find whether this column is used in it.
And as usual in this ‘BI system Needs and Means’ series, let’s look how it’s done in IBM Cognos BI and Oracle BI.
Cognos BI
Cognos BI has had an SDK (Software Development Kit) since ReportNet (for more than 5 years already) and this ‘long-time in market’ allowed to develop a whole ecosystem of tools to breed on top of it.
Concerning the question at the start of the post — there’s:
Cognos Dynamic Report Updater — a tool to replace change column in every report.
Moreover, look at whole bunch of tools, based on SDK:
and tools, developed by third-party vendors:
Oracle BI
In Oracle BI, finding such changes is somehow easier, since all reports are stored as xml files in presentation catalog folder. You just write a simple script that iterates on files, finds a string in file and reports back it’s findings (or replaces this string with a new one). Sounds easy, but after spending a couple days at it, I’m not so sure anymore )
There are a few major point against this way
1) It’s unsupported, there are no declared report modification API’s, AFAIK, so you’re scripting at your own sake. That would not be a big problem, if not:
2) Oracle BI presentation service adds a hash tag to report (and folder) names if they exceed 55 characters in length (see more on this topic here: http://obiee101.blogspot.com/2010/02/obiee-report-name-length.html).
2.1) If report’s names are not in ASCII, they are converted to UTF-8 (but starting x symbol is replaced by #), so each non-ASCII symbol takes 5. So if you find file, containing desired string, you can only decode back first 9 letters of it’s name — report names are usually much longer than that. Therefore your searching capabilities are seriously limited.
2.2) Folder name hashing poses another problem if you’re using Windows. It impossible to iterate through catalogcontaining hashes an # in it’s name — OS listing methods stop working. Hope that situation is better on Linux. An easy but entirely manual workaround is to rename catalog folders to numbers (1,2,3,4) before starting the script ant to rename it back after.
I totally wish that this situation will change in nearest future and there will be a simple Presentation Server API call that will bring report name from file name and vica versa.
As usual, if anyone is interested in scripts I wrote for the task (python, as usual + vbs) — ping me.