Cognos PowerPlay Transformer Limitations

Just a bullet list of things that separate Cognos Transformer from being a capable OLAP-tool in modern market. Neither of these problems will be solved, I think, and this will lead to Transfomer’s death in near future (about 5 years, imho, given huge userbase). That’s a pity, since it’s a nice tool, absolutely brilliant 10 years ago. But as data sizes grow, Transformer just cannot scale. Hyperion did a nice job of inventing ASO option when it faced volume challenges, Cognos decided all was good enough.
List:

  • No dimension attributes There’s no way to add attributes to dimension elements, only to model them as separate dimensions (or alternate hierarchies), which is very inefficient both for modeling and for reporting. For example, try to display all sales in stores with 200-300 m^2 square footage
  • No multiple languages support No way to keep a single cube for German and English users, only separate cubes for each language and therefore more processing overhead and administration burden
  • Very slow zero suppression See this post
  • Very bad hardware resource optimization Transfomer is essentially a single-threaded application and cannot use more than 2 cores for cube build. Even with top level CPUs it’s just not enough. You can build multiple cubes in parallel to optimize resource usage, but that doesn’t help much either (see following point)
  • No multiple-cubes partitioning Cube processing limitations could be solved by dividing cube in multiple ‘cuboids’ and presenting them a single cube to end-user applications. However, time-partitioning imposes a serious limitation of only one time-dimension in cube and group paritioning doesn’t present a single cube model for analysis
  • 32bit applicationTransformer is a 32bit application and cannot utilize a lot of memory for processing, which could help things a lot. Moreover, famous cube file size limitation comes from this reason as well
  • Not MDX-complaint You can’t use anything except Cognos BI to work with Transformer cubes data. An XML\A provider would be very usefull
  • No Parent-Child hierarchies You can model then by dimension views but it requires some serious scripting. A very nice thing to have ‘out-of-the-box’
  • Given all of the above, powerplay is a nice single-user tool with friendly modeling gui or an OLAP-server in ‘small-data’ enviroment. As a single user tool it will be replaced by PowerPivot and as an OLAP-server it will be replaced by TM1 (which I’m not happy with — see following posts) or Microsoft Analysis Services.

    Cognos Powerplay zero suppression

    Extracting from a discussion from sql.ru forum:

    PowerPlay performs very bad when it comes to large and sparse (with big number of nulls and zeros) cross-tabs.
    Modern OLAP engines are specially optimized forNonEmpty MDX clause, which is default report generation option.
    PowerPlay was designed for fast cross-tab rendering and maximum amount of calculations on client machine. Nobody changed that, although it was promised. And nowadays dimensions with hundreds of thousands elements are very common and millions are not at all unique.
    So if you now want to use Transformer — keep in mind that you should create most used reports in Report Studio, using filter clause to push suppression back to server.

    Cognos BI temporary files (UDA)

    Just a link to developerWorks thread where Phil_W gives an excellent explanation. Note this guy, all his post are truly brilliant: concise, informative and insightful. I use ProcessExplorer to find PIDs too.

    And all the rules for Oracle BI temp files stay in place — I\O speed is crucial, use SSD )

    Replacing strings in Oracle BI Publisher reports

    Oracle BI publisher works awfully bad with Answers type of datasource in at least two points:

    - column names are created by answers formulas instead of column names, so if you have long formulae, it’ll be shortened to some 128 chars. And if first 128 chars of formulas are identical, you’ll get only one of them in publisher
    - if columns return null in answers – they will not be available at all

    We ended up using logical SQL generated in answers as a source for publisher reports, which works like charm up until you start changing repository or changing underlying data you’re using for filters.

    - repository changes can be handled by adding aliases to columns being renamed
    - data changes are way more difficult to tackle
    For example when you change city name from ‘NY’ to ‘New York’ in geography dimension (this should be solved by BI 11 id\description feature) you have to change all formulas and filters using this value.
    So I wrote a simple Python3 script, doing exactly that — replacing all needed strings in publisher xdo files. If there’s any need for that — mail me, I’ll send it over.

    Configuring Ms SQL databases for Cognos Enterprise Planning

    Just a quick side note — I usually use simple recovery mode in Ms SQL installations because:

    • application databases become inconsistent during administration tasks so having a backup of database in a middle of GTP crash doesn’t give you anything, you’ll have to roll back to GTP start time. So backing up before and after administrative task is better.
    • publish databases are created via bulk load, which gets rather slow with full logging as log files grow very fast.

    In ideal situations, application databases should be in Full Recovery Mode not to lose contributor input and publish databases should be in Simple Recovery mode. But in most cases Simple Recovery Mode and daily backups suffice.

    Because otherwise — and it’s oh so common situations — I arrive at the sites with planning databases occupying hundreds of Gb (no, really, I’ve seen another such case just yesterday).

    Here’s a simple script that turns all planning databases (selected by cognos_ep prefix) into Simple Recovery Mode and then shrinks them.

    USE master;
     
    DECLARE 
        DBNames CURSOR
    FOR
        SELECT
    	  NAME 
        FROM sysdatabases
    where name like 'cognos_ep%'
     
    OPEN DBNames
     
    DECLARE @Name varchar(50)
     
    FETCH NEXT FROM DBNames
    INTO @Name
     
    WHILE (@@FETCH_STATUS <> -1)
    BEGIN
    	print @Name
    	exec ('ALTER DATABASE '+ @Name + ' SET AUTO_SHRINK ON WITH NO_WAIT')
     
    	exec('ALTER DATABASE ' +  @Name + ' SET RECOVERY SIMPLE WITH NO_WAIT')
    	exec('ALTER DATABASE ' + @Name + ' SET AUTO_SHRINK ON')
    	exec('ALTER DATABASE ' + @Name +' SET RECOVERY SIMPLE') 
     
        DBCC SHRINKDATABASE( @Name , 0)
     
        FETCH NEXT FROM DBNames
        INTO @Name
     
    END
     
    CLOSE DBNames
    DEALLOCATE DBNames

    And a bottom line: Worry about backups! Hire a DBA (setting up backups is a one day job — pay part-time)

    Pivot Tables and temp files in Oracle BI EE

    Pivot tables

    Pivot tables are by far most used data view in Oracle BI Answers, since they provide ‘multidimensional’ analysis capabilities, aka dimension drag&drop, filtering and some basic calculations.

    From server-side view — this component provides worst performance, since all future ‘cube’ data has to loaded into BI server, aggregations and formulas calculated and transformed to internal format.
    So what happens when user clicks a pivot for 200,000 cells cube?
    1) SQL-query is fired into database
    2) All resulting rows are loaded into temporary directory of BI server
    3) If there are calculations to be performed — initial temp file is read, calculated and new temp written out
    There are two important instanceconfig.xml parameters for Pivot tables:
    - CubeMaxPopulatedCells – The maximum number of populated cells in the Pivot Table. If this maximum is exceeded, the user receives an error when rendering the pivot table.
    and
    - CubeMaxRecords  – The maximum number of records returned by a query for the Pivot Table engine to process. This roughly governs the maximum number of cells that can be populated in a pivot table (unpopulated cells in a sparse pivot table do not count). If this maximum is exceeded, the user receives an error when rendering the pivot table.
    All instanceconfig.xml options can be found here.

    Temp files.

    Temp files can be rather huge, I’m currently seeing files about 1gb size for medium sized reports. That, surely, because we’re using MS SQL server and BI Server cannot “push” some calculations in database, but it’s nice to keep in mind the fact that temp files can occupy some 20-30 Gb.
    Temp files are cleaned out when session logs out, so some of them stay there for whole day.
    Location of temporary files is set up by WORK_DIRECTORY_PATHS variable in NQSConfig.ini and by SATEMP enviroment variable. See this post for instructions on how to move tmp folder.

    Speeding things up.

    Rather huge files are written to temporary directory, so the IO performance becomes an obvious bottleneck for process.
    What can be done:
    1) Use multiple tmp\ directories on different disks to balance the load
    2) Use Ramdisks for tmp\ — I’ve tried this way and got a some fantastic 5x speedup on most IO bound reports. But it’s pretty hard to find free 30 Gb of RAM, so we’re using multiple disks for now
    3) Use SSD for tmp — that’s what I plan to try next. I bought an SSD for my laptop about a year ago and my life has never been the same, so stay tuned )

    Essbase ODBC configuration on Solaris

    Using this blog as offline memory. Just ignore this post, if you’re not up to Essbase )

    If you’re configuring DataDirect ODBC drivers on *nix system, be aware of following:

    1. There are 2 versions of odbc.ini file, one in 32bit driver folder, other in 64. 64 essbase uses 64 drivers.
    2. You should copy required odbc.ini file to essbasepath\bin. And you should create a symbolic link (ln -s) to odbc.ini file in essbase user home folder (/home/hyperion, for example)
    3. When creating a SQL data source rule file — ServerName corresponds to database server name, not essbase server name (type TNS name for oracle database)

    Useful links:

    Essbase SQL Interface Documentation

    Metalink on search keywords ‘odbc.ini + essbase’ )

    BI system metadata impact analysis and what BI system APIs are for

    “Okay, this column in our BI metadata was supposed to be named Sales, not Sals.
    But when we change it, what will happen to the reports that were created, using this name? And how many reports use this column anyway? Maybe we shouldn’t touch anything to keep things working…”
    Ever been in such situation?
    This question can be answered by opening each report and then looking at it’s contents ) A horrible task indeed.
    Actually, this is exactly what BI system API are made for. Using them you should be able to automagically look into each report and find whether this column is used in it.
    And as usual in this ‘BI system Needs and Means’ series, let’s look how it’s done in IBM Cognos BI and Oracle BI.

    Cognos BI

    Cognos BI has had an SDK (Software Development Kit) since ReportNet (for more than 5 years already) and this ‘long-time in market’ allowed to develop a whole ecosystem of tools to breed on top of it.
    Concerning the question at the start of the post — there’s:
    Cognos Dynamic Report Updater — a tool to replace change column in every report.
    Moreover, look at whole bunch of tools, based on SDK:
    and tools, developed by third-party vendors:

    Oracle BI

    In Oracle BI, finding such changes is somehow easier, since all reports are stored as xml files in presentation catalog folder. You just write a simple script that iterates on files, finds a string in file and reports back it’s findings (or replaces this string with a new one). Sounds easy, but after spending a couple days at it, I’m not so sure anymore )
    There are a few major point against this way
    1) It’s unsupported, there are no declared report modification API’s, AFAIK, so you’re scripting at your own sake. That would not be a big problem, if not:
    2) Oracle BI presentation service adds a hash tag to report (and folder) names if they exceed 55 characters in length (see more on this topic here: http://obiee101.blogspot.com/2010/02/obiee-report-name-length.html).
    2.1) If report’s names are not in ASCII, they are converted to UTF-8 (but starting x symbol is replaced by #), so each non-ASCII symbol takes 5. So if you find file, containing desired string, you can only decode back first 9 letters of it’s name — report names are usually much longer than that. Therefore your searching capabilities are seriously limited.
    2.2) Folder name hashing poses another problem if you’re using Windows. It impossible to iterate through catalogcontaining hashes an # in it’s name — OS listing methods stop working. Hope that situation is better on Linux. An easy but entirely manual workaround is to rename catalog folders to numbers (1,2,3,4) before starting the script ant to rename it back after.
    I totally wish that this situation will change in nearest future and there will be a simple Presentation Server API call that will bring report name from file name and vica versa.
    As usual, if anyone is interested in scripts  I wrote for the task (python, as usual + vbs) — ping me.

    Freebase GridWorks — a data-driven approach to ETL

    Take 5 minutes to watch screencast for FreeBase Gridworks — an interesting new approach to data transformation. Instead of ‘transformation-based’ approach of every tool on the market, this tool uses ‘data-based’ approach, which looks rather intriguing. Especially ‘Undo-Redo’ function )

    I keep thinking when to use this tool. User driven data load in DWH? This tool lacks MDM capabilities in such case. Manual cleaning of csv files?…

    TM1 hint – TurboIntegrator Local Variables

    TM1 TurboIntegrator (TI) is truly a highly customizable tool :) I recently needed to be able to pass cube view names as a parameter for export process and it turned out to be really easy.
    Although DataSource is selected via menu during TI process creation and there are no TI functions to change it, it can be done easily via TurboIntegrator Local Variables. There are local variables in TI (namely, DatasourceType,DatasourceCubeview and some more) that are set during GUI datasource selection. But they can be set in procedure as well.
    So you just:
    1 Add a Parameter of type string

    2 Update needed TI Local Variable using this Parameter

    And then work with data as you like )

    There are many interesting Local Variables in TI, especially those that define ASCII output (delimeters, quotes).