BI and CPM Gartner Quadrants ‘10 are out

Rather interesting:

BI

CPM

The CPM Big Three(aka leaders quadrant) is formed and I think they'll just waltz around the quadrant in next couple of years.

PS: Just to notice, they say aggregate awareness for Cognos BI is in beta-testing. Just after I spent so much space promoting database-level aggregate navigation.

Oracle BI Enterprise Edition vs Cognos BI

Since I've spent some time working with Oracle Business Intelligence last year, I think I'd write a simple comparison list of strengths\weaknesess of OraBI EE vs Cognos (see post on the same topic by Venkatakrishnan). All of this is imho, of course.

Oracle BI EE Pro's:

1) Aggregate navigation

-- ability to set up aggregate tables in BI EE itself, which gives it 'aggregate awareness' while generating SQL. So if you have monthly sales aggregate you just set up BI metadata accordingly and all reports will target this table for monthly data.
There's no such feature in Cognos metadata setting.
But using this feature arises some questions:
a) whether aggregate definition should be BI-tool specific. A lot of practioners, including Kimball, insist that aggregate navigator should reside at database level so that every tool querying datawarehouse will benefit from aggregate avalaibility. This seems rather reasonable, since every datawarehouse is usually queried by more than one analytical application (BI, datamining, more BI) so you either direct all applications to BI EE, or agree to performance degradation.
b) aggregate tables desynch. When you just update the base fact table, aggregate tables become "out of synch", providing incorrect query results untill they're recalculated. This means that you either can guarantee that nobody will access reports in this period, or you can have incorrect data. Since more&more datawarehouses squezee load windows to reach real-time this problem gains priority.
c) choosing which agregate to use for answering questions. That's what statistics is all about in dbms, knowing number of rows and value distribution (lol). Oracle BI EE has the "table row count" feature (although I haven't seen it affect SQL generation yet, need more examples). But there's no value distribution analysis in there, so it's just one side of the coin (not talking about I\O device speed and other characteristics).

In general, it's recommended to use database-specific aggregate table functionality (Oracle Materialized Views or DB2 MQT) since it solves all the 3 questions given above and usually simplifies ETL process (and sometimes even speeds it up, since databases use their own transaction logs to detect what data has changed and what aggregates should be rebuilt). Too bad indexed views in Ms SQL do not work with aggregate dimensions (there's no way to define dimension hierachies there).

2) Cache management

-- OraBI has really profound cache management facility. You can "cache" any database query to OraBI specific storage structure (cache file), wich will allow subsequent queries to the same table\query to run without actual database request being made. This can greatly speed things up. I especially like the Event Polling Table feature: you add a table, which records when dwh table was last updated. OraBI then reads at given intervals and automatically invalidates old cache entries, based on this table records.
Moreover, if you have OraBI cluster this cache can be shared among servers.

There isn't anything even close in Cognos.

Although I greatly like this feature, I just want to warn about overusing it. It's easy to imagine BI developers boosting performance by adding more&more cache untill OraBI becomes a fully blown aggregate system. And sometimes it's just about 1 aggregate table at the dwh level ) Or about introducing OLAP server in the enviroment )))

3) SQL generation.

It's tricky subject, but for now I like OraBI generated SQL more. But it's "apples to oranges" for sure, since I usually use DMR's in Cognos which encumbers SQL greatly and there's nothing identical in OraBI EE.

Oracle BI EE Con's:

1) Multidimensionality

Cognos has Analysis studio and the ability to navigate hierarchies in both directions (you won't believe it, but in OraBI there's no 'Drill-Up', only'Drill-Down'). And DMR's and analytical functions (but their usage is a bit annoying, as it seems now, hope to write about it later). Anyway, Cognos is way much more 'multidimensionally-ready' than OraBI.

OraBI + Essbase is a work in progress and has a huge number of caveats (some fixed by patches, some not, some introduced). And the only way to use all Essbase functionality is to write direct MDX via Evaluate functions. That's a big problem, since it's hard for us to suggest OraBI on top of Essbase for now (till 11g once again). The only alternative is Visual Explorer, which is a very good tool, but it for top-analysts only (thick client, costly).

2) Metadata model development

Instead of OraBI's 'only-star schema', '3 layers of model' Cognos FM Manager doesn't impose any design principles, which allows more mistakes, but it makes some things way more simple. One of the first things I wanted to do in OraBI was a report using just a single table.  Well that's a really funny exercise (see posts over here) if a star schema is a must.

But the main problem is the lack of API for metadata changing and browsing. There's udml, but it's not supported officially. Therefore all current integration scripts (like adding users, merging repositories and working with hierarchy depth changes) are out of the law. Which doesn't stop anyone, but is pretty annoying.

3) Pixel-perfect reports

Oracle BI Publisher is a specific tool, aimed at generating a huge number of formatted reports, based on XML format definition files. It wasn't a part of Siebel BI, so 'integration stitches' still stand out. It's a nice tool, but it certainly lacks web-interface ) Therefore in Oracle BI there's a deep distinction between a simple formatted report (with lots of possible logic in it) and making this report 'printer-friendly' since for the latter you basically have to start from scratch by opening Ms Word )  This will change in 11g as they say )

Having said all that, I really wait for Oracle BI 11g edition to start using it with Essbase and I kinda like the product as it is for "relational-only" reporting.

I surely wanted to write a simple bullet point list at first )

Memory leaking in Cognos 8.3 BI

You know why i've started this blog in the first place? To get advice, of course. And today DesiCresnet shared a wonderfull piece of information concerning memory troubles in BI 8.3.

Those who aren't struck by occasional CAM-AAA-0071 “An internal error occurred” which stops the whole server can skip this post.
This error was discussed at:

http://www.ibm.com/developerworks/forums/thread.jspa?threadID=244357

http://www.cognoise.com/community/index.php?topic=4971.0

The story goes like this:
- java.exe (that powers BI), grows up to the point where it cannot obtain contigious memory from server and crashes
- Cognos BI doesn't recognize this error, so it throws out a general CAM-AAA-0071
- Cognos BI is unaccessible until restart

This error occurs more often in heavy-usage enviroments. In one of our projects, with lots of Event triggered admin-links and reports, it took only a coulple of hours before it striked.

Solution is not complete, but as I see it now:
1) Put the latest SP on BI (or a special hotfix, if one is avalaible)
2) Lower the avalaible memory for Cognos BI service (to 768 mbs)
3) Modify the HeapDecommitFreeBlockThreshold registry property

It's interesting if 8.4 is prone to such error.

PS: I'm a certified Cognos Technical Specialist now, as well. Need to get first-line support partnership, you know )

Cognos 8 EP troubleshooting import data from package

Tip of the day: Having common metadata layer for BI and Enterprise Planning is really cool, especially with the ability to load data from BI packages to EP via links. Just that it sometimes doesn't work. But there are always some workarounds.

With mostly very cryptical error messages or without any.
One of the problems might be just that DataManager service (which does the transfer) cannot allocate enough memory for operation. This is what we've encountered just today.

Good news are that this parameter is easily configurable, you can just edit cognos_install_path\bin\dataimportserviceconfig.xml file and set JMX (Java maximum allocated memory) to something more appropriate than default 128M (1024 is a nice number, for example).

Try this if your links fail ;)

PS: In this 'wonderfull' Java world, setting JVMX is a common boost-up technique. And there's a lot of places to apply it. Think about Cognos Rich Client Platform (Cognos 8.4 Contributor) for start...

Cognos 8 BI Tuning Hard Drive Usage

Just a quick hint and a reminder: there is a configuration option forcing Cognos to store temporary report results in memory rather than dropping them on disk. It's especially usefull for DMR's they are frequently cached on disk.

Quote from Architecture and Deployment Guide.

Depending on the size of reports and the amount of available memory, IBM Cognos 8 may access a physical disk when processing reports. To improve performance, you can ensure that report processing uses available memory rather than disk space. Using memory instead of disk space is particularly beneficial in cases where temporary files are
created on IBM Cognos 8 servers, causing information transfer from memory to disk. You can monitor the occurrence of temporary files using the c8_location\temp directory.Monitor this folder during report processing periods to determine whether temporary files are created as cclvpage*.tmp. To ensure that IBM Cognos 8 uses memory instead of disk space, in the rsvpproperties.xml file, edit the VirtualMemoryDiagnostics property to use unlimited memory (value = 2) rather than limited memory (value = 0):
<property>VirtualMemoryDiagnostics</property>
<value type="long">2</value>
Note: Remove the comment to enable the VirtualMemoryDiagnostics property.
For information about using the rsvpproperties.xml file, see "Advanced Report Processing Configuration Settings" (p. 96).

TM1 + Enterprise Planning, When, Why and How?

This post will be some extracts from a recent PoC, where we made a complex planning system including a rich scenario modeling part. So it was our first big TM1 + EP integration experience.

Some basic definitions:
Cognos Enterprise Planning (aka Adaytum EP) is a very scalable system, all computing is client-side, nice workflow organization, nice GUI (version 8.4 Eclipse one especially) with huge limitations when it comes to the moment of “show me the whole budget in one place” or to serious what-if modeling. But scaling is trivial, each server acts as many clients simultaneously (number of cores\CPUs) and adding more servers scales the system linearly. So to build really scalable systems, we made Contributor only working places, with all analysis done in Cognos 8 BI and Analyst was turned into a modeling tool. In adaytum and cognos 7 times a lot of things(consolidating, what-if) could be done (and to be honest, still can be) in Analyst, but just up the point you reached 40mln cells cube limit in Analyst, then you started dividing cubes and the system became nightmare. Transforming the same consolidation task into series of Contributor applications connected by admin links gave huge performance enhancements, but the "show me the whole budget" limitation was a major problem.
Cognos TM1 (aka Applix TM1) is an in-memory OLAP engine, ultra-speed calculations, server side computing, excel based gui (with web publication), limited workflow capabilities (compared with current EP), scalability limitations (imagine putting servers across whole Russia to ensure response time)

So when it comes to a planning system with large user base – it’s EP, but if some serious modeling and calculations are requied as well – it’s time to fire up TM1 as well.

Overall interaction scheme seems pretty simple:
tm1ep

1) Most of users work in EP inputting budgets, submitting and doing all other workflow related activities
2) After some user actions (saving budget, submitting, accepting it) data is transferred into TM1.
a. Detecting the desired action is easy – we’ve got Event Studio for that (see here and here).
b. Incremental publish is fired and changed elist is republished – that’s pretty fast
c. Then there’s a problem to extract only changed data from publication. That’s the same as with incremental admin links, you just write a simple database procedure that logs last data transfer time and create views showing only recent data (as usual, you want code samples – drop a line)
d. Changed data is bcp’ed\sqlloaded to TM1 server
e. TM1 TurboIntegrator process is fired, loading the extracted data into TM1
3) Analysts work with “live” data in TM1, doing what-if analysis (they dynamically add scenario versions for this purpose), having access to all enterprise budget and actual data.
4) Then a version could be used for top-down propagation and exported into EP (that’s pretty simple, just some writing into EP applications import tables)

The whole system seems pretty healthy, data transfers happens with 1-2 minutes latency (we used Save as trigger action). Doing any kind what-if analysis on a budget version only 2 minutes "stale" with ultra-speed seems very impressive to me. For instance, in that particular PoC the goal was to speed up overall company budget consolidation, including complex allocation and elimination rules. It took over 8 hours in their current system. And only 4 minutes in Tm1 -- imagine all the benefits you can get from that kind of speed.

Since there are more and more rumours about coming TM1+EP bundle, I just hope that step 2 will be more automated. For example, they can track delta xml's sent by contributor client, containing all data updates, instead of publishing and bcping. But I can see no way they can get rid of neccesity of creating 2 models, one in Analyst for Contributor applications and another in TM1. Although dlists\dimensions can be synchronized, calculations are utterly different (you can target specific cube cell in TM1 formula and only the cube slice in EP).

Essbase ASO key structure

Yet another post on Essbase )
Internal ASO storage structure is a block box, so unlike BSO. So while Roske haven't writen a book on ASO option, we have to wonder in the dark. 

The main description of ASO storage is "data is stored as key\value pairs". This quote comes from ASO Tuning WhitePaper, the only "internals" technical description of ASO out there. It's for 7, but all the concepts are still valid.

So, "key-value pairs" it is. Key length is very important then, because it directly affects:

a) size of the database, if you've got a billion rows cube, then -1 byte for every cell means 1 Gb less overall cube size (that's raw size, it'll be compressed later). 
b) query processing -- for every query cell key got to be calculated for data retrieval. Less key, faster the query, as I think.

You can see the key description on Application Properties page or issuing "query database test.db list aggregate_storage runtime_info;"  MaxL command .
Here's the sample output. 

parameter                                         value                                            
+-------------------------------------------------+-------------------------------------------------
 Dimension [Date] has [3] levels, bits used                                                        1
 Max. key length (bits)                                                                            1
 Max. key length (bytes)                                                                           8

we see following characteristics of dimension -- number of levels and number of bits used to encode values in this dimension. Cell key is concantenation of each dimension keys. 

But that number of bits is pretty cryptic, though. On a project we're doing we have a couple of 5 mln elements dimensions and key length for them is around 40 bits. If you use simple binary encoding -- you can encode 2^40 elements (that's really a huge number, it contains 14 digits). It's no a simple binary encoding then. 

So how are dimension elements encoded? 

Well, it's allways easy afterwards, so I'm not so proud now, as I was when it first struck me )

So I now think that the technique called hierarchial encoding (or indexing) or something similar is used, so each dimension element is encoded the following way:

Let's assume that we have 3 levels in dimension, then the key will be formed like:
xxxyyzzzz
where xs -- are the bits requiered to binary encode level 2 parent
ys -- level 1 parent
zs -- the bottom level element

So to form the key length for all dimension you have to concantenate binary keys required to encode elements on every level. 

Therefore there are a few things to keep in mind while doing big-scale ASO projects:
- think about number of levels in big dimensions, keeping them short greatly decreases size
- try to keep number of elements even on every level -- that's funny, but it'll help to fully "pack the key" 

But that brings up a few more things about ASO i'm thinking of:
- how are pages in tablespace filled for data storage? I think some sort of hierarchial clustering, using key parts to point at pages
- what and how is stored in the outline (otl file)? Membernames and aliases for sure, but also the dimension elements keys, which are concantenated to get cell key and find the data page in tablespace.

 
So I'm still waiting for Edward to publish ASO book, there's a lot of questions. The book seems to be close )

Loading data into ASO cube

Hm, I guess it's the time for a first big essbase-related post. Today we'll talk about data loading in ASO cube.

Those who've listened to our recent Oracle events presentations (russian TechForum 2008, BI Forum 2009), know that we've made a nice PoC project with SportMaster, consisting of loading their year's worth of daily stock and sales data (around billion fact cells with over a million elements in products dimension).  All in all, I've decided to post a write-up about how to achieve acceptable load speed on such volumes and tell a scary-story, with a detective twist and a happy ending.

I won't go in much detail on basics, so:

Since it's a billion cells cube, it's ASO storage option.

Since it's an ASO Cube, we use load buffers for data loading.

Since we use load buffers, we try to split input data into chunks, to maximize data read speed.

Since we use load buffers and we were doing PoC on 9.3.1, we have to extract data from DWH to text files (in 11 we could use ODBC based load buffers).

Just a remark: Essbase Integration Services does a lousy job on either building dimensions (we're adding 14 attributes for products and loading a file again for _every_ attribute is a disaster) or loading data (no visible ability to run multiple buffer load). No comments on 11 Essbase Studio, haven't tried it.

So, we've extracted data from DWH to text files and are loading these files into text buffers. While loading data from file into buffer, with a simple rule file I could achive data read speed of around 6 Mb\s (on a 80 Mb\s disk system, we've got two such disk systems attached on PoC server). Loading two files in parallel, gave 12 Mb\s. So to fully load up disk system we need to chunk files into 6 parts. That actually doesn't present a problem, but seems a bit awkward. While testing, I've accidentially forgot "using server rules_file" in MaxL script and was astonished by load speed of 25-40 Mb\s! That's how I've found out about 'free-form data load' )

Essbase has a built in ability to load data without rule file (see DBAG). And, moreover, this is almost 10 times faster than loading with rule file. Easiest way to find more about free-form load is to export an ASO cube to text file (right-click on database in Essbase Administration console). Moreover, free-form allows to load data in cross-tab (pivot accounts in columns), therefore greatly suppressing number of rows. Using this technique requires additional data formating: spaces as separator, double quotes enclosure, writing "Missing" or "MI" for null values. But load speed is worth it )

So the bottom line: free-form data load, additional data massage -- 70 Gbs of data in less than 30 minutes, almost reaching half of disk system speed.

That's it for fast data loading, if you're on 9.3.1 i'd recommend trying to massage data into free-form and load it this way.

Now for detective part of the story. Read the rest of this entry »

Cognos 8 BI Dashboard Studio

Finally got my hands to installing and trying it.
Well, it looks nice, allows flash-based chart formating and turns Cognos portal pages into more dynamic "dashboards".

But it's strangely slow (tried on a couple of computers) and all in all looks kinda "alpha" and plugged in at last moment. Installation is not that easy too, it requires war building and deployement, procedures that not every BI consultant is aware off.

So I think we should wait for another couple of service packs and there'll be a new standard tool for portal page development. XCelcious should be afraid )

Cognos and MS SQL — happy together

 Correcting myself. I was ranting about MS SQL and Cognos interaction. I was totally wrong.

The problem, described shortly is:

  1. you have 2 databases in MS SQL residing on the same physical server

  2. you create a join “table A from db1 with table B from db2”

  3. Cognos creates two select queries (select * from db1.A) and (select * from db2.B) and then does join locally on server. That’s terribly slow. Cognos should be generating one select query, performing the join on the db server and that’d be really fast.

The answer is simple: there’s no way Cognos can find out that these two databases are on the same physical server. But we can help old fella a bit, setting a datasource property (Content Manager Datasource) the same for both datasources. Therefore, the same datasource prefix will be generated for both tables and they’ll be regarded as residing on the same server. Query speed will increase dramatically.

Morale:  always set the datasource property for MS SQL databases located on the same server. 

FireStats icon Powered by FireStats