Cognos Transformer multi-server cube building revisited

I was decribing the 4-year old trick of using Cognos server cluster to build powercubes and got a new idea.

Really, PowerPlay cubes building via scripts is a so 1990s, 21st century is for enterprise scale, failover, monitoring and you-know-the-pitch.

If we make Transformer cube build a Report/Job in Cognos Connection, we get:

  • load-balanced cluster builds (cube building task will start on any avalaible server, we need to ensure server will be able to run the build script)
  • monitoring (the same Cognos System \ Schedule pane in Administration) and build history
  • event-management (builds will be recorded in audit tables and we can do notifications, restarts and scheduling in Event Studio)

The question is how to make cube build a Job / Report run with minimum of hassle. I was originally thinking of a simple ASP page, running a cube build batch script, used as a web-service datasource with cubename as parameter, but (that’s the a-ha moment) you can use a simple Java application, running in the same application server as Cognos itself. That’ll give platform independance and nice url (ibmcognos/buildcube.jsp?cubename=…).

Moreover, you can package cube build scripts into the application itself, make them configurable (model location, backup location, number of copies to retain, etc), make the application detect Cognos version (to know how to swap cubes) and detect operating system (what scripts to use) and you’ll have all the cube build logic packed into a neat single file container that you just copy accros all servers in cluster. That’ll take away all script maintenance burden and give all the benefits of centralized enterprise way. And it’s not a big development effort, I’d estimate it a couple of weeks spent with coding hat on.

Strangely that neither of big “all-things Cognos” shops (BSP or Motio) didn’t do something like that, there’s still a huge lot of PowerPlay shops around that’d love something like this. And some of them are going from 8 to 10 right now and that requires rewriting parts of cube-building scripts ;)

TM1 MDX querying snippet

I’m playing around with some TM1 ideas and all of them require some sort of sparsity analysis. And since TM1 supports MDX (to some extent, to say it mildly) and I like MDX and used it a lot with Essbase, I thought it’d be wonderful to use MDX.

And any post is better with a cat picture in it, don’t you agree? This one is about a casual TM1 developer embrasing the ocean of MDX possibilities.

But MDX-dialect of TM1 isn’t well-documented (again, being very polite), so learning it’s intricacies is mostly trial and error and an open field for experiments. And whilst you can write MDX-driven subset definitions (wonderfully described by Phillip Bichard and Martin Findon) in Architect, there’s no actual tool where you can write “select years on rows, measures on columns from sales” and see you table (all the Cognos studios do that, but in background and with quite inefficient way of writing you own). You can go wild on the ODBO page of a TI process as well, but that’s a bit hard from error handling and results viewing.

 

So at the end of a day I wrote a small Java based script that allows you to write MDX queries to your TM1 server and get result back as a csv file (there’s an Excel workbook doing the same thing, but I’m a Java guy and I needed the querying code to conquer the universe anyway).

 

How does it work?

1) You fill in a query you want to fire in an xml file with some additional options (admin server, login/password, file to write query results to, will you need headers and all that)

2) Run java tm1_mdx your_query_file.xml and get either an incorrect MDX syntax error (bad, I get this often) or results in a CSV file (good)

 

As usual, there’s source code in the package, so you can adjust it \ use it as a base for something bigger.

How to install it?

  1. You will need  Java runtime engine (jre). This the only requirement, there’s no need to have TM1 installed or anything.
  2. Download the tool (this package is for 9.5.2, ping me if you want it for some newer\older version, it’s just a matter of recompiling), unpack to some directory (c:\tm1mdx, for example)
  3. Cryptic step of adding security certificate to your Java.
    1. Find jre\bin\keytool.exe utility on your computer
    2. Open the command line and navigate to jre\bin
    3. Run command line keytool -keystore ..\lib\security\cacerts -alias Company -import -file c:\tm1mdx\applixca.der
    4. Enter ‘changeit’ as password
    5. Type yes
  4. Fill in the mdx_query.xml
  5. Open command line, navigate to c:\tm1mdx, run java tm1_mdx mdx_query.xml

 

PS: By the way, I glimpsed into TM1 10.1 API docs and all the APIs were greatly enhanced in this version (mostly for Performance Modeler, it seems), so you finally have a per line grained access to rule files and such. It’s obvious that Performance Modeler uses Java API (it’s using Cognos Rich Client Platform, it’s inevitable),  so it’s extensively used and developed and might even get documented at some point )

PSPS: It’s interesting whether there’ll be any IBM-made XML/A driver for TM1 at some stage. It seems possible to do a  straw olap4j driver for TM1 even now and that’d be a funny way of opening it up a bit further.

Cognos Content Store Security Overview in SQL

Ever found yourself wondering what users are really there in your BI Consumers or any other group? Sure, you’ve got a few of them explicitly listed there (bad practice) but mostly you’ve got just a list of other groups like “Western Division”, “Marketing” and all those (good practice). And they’re, most likely, nested into each other a few times (like Western Division OpEx and Marketing Asia).  Or, looking at it from totally other point of view — what groups Jack Smith is really in (again, with all these nesting ones)? Is he a member of Western Division or what?

 

Built-in Cognos Security administration is quite awkward (compared to so long gone Access Manager, sigh), so questions like above are so frequent. There are 3 main solutions:

1) Buy an externally developed tool from BSP or Motio. I had a chance to look at BSP MetaManager recently and it does a whole lot of stuff to make your administration life easy. Bit of an overkill to just ask “who’s there”, but totally worth it if you’re going big scale \ long term.

2) Buy an SDK license and use the sample script from IBM as a base and extend it just for this task and then use SDK left and right to do anything you want in Cognos. If you have an SDK license through some global license deal — the sample script maybe just what you need to get the job done.

3) Write SQL queries to Content Store database to return just this info. It’s an unsupported (I’ll even stress this) unsupported way of doing it, but if you just want to have a quick look or are happy to loose it at some upgrade point (although it works on both 8.4 and 10.1, CS database schema isn’t changing that much), here are the scripts for both MS SQL Server and Oracle.

 

I used the script from this SQL.RU topic as a base and added group nesting unwrapping with Connect By in Oracle and CTE’s in SQL Server. You can turn this SQL into FM query subject and have a nifty report.

 

 

Continue Reading »

New recruit to my ETL toolbox

I’ve recently completed my first real DataStage project and took a chance to get certified while all the stuff is still “fresh”. Certification itself is quite complex and I didn’t use most of the tricks depicted in questions up until the moment when one of the jobs had to process a quarter billion of rows in a reasonable timeframe. From that point on I learned quite a lot about partitioning, balancing, debugging and choosing right stages to do the job (who would’ve thought that RemoveDuplicates is waaaaay slower than Sort (with Remove Duplicates option) — why put RD stage in at all?) Anyhow, now I’m also an IBM Certified Solution Developer — Infoshere DataStage v8.5 )

So my current ETL tools breakdown goes smth like (not counting PoC and likes):

  • Oracle Data Integrator — 3 projects
  • Pentaho Data Integrator — 2 projects
  • IBM InfoSphere DataStage — 1 project

And that’s my current preference list as well. I love ODI’s flexibility (it’s actually very simple once you get it how it works and it’s extremely configurable), ELT approach (I’d rather be tuning my DBMS than DBMS and a separate ETL engine). PDI is very open and quite user-friendly (compared to DS, for example) and it’s easier to understand & debug than ODI. PDI community edition is enough for most small data sized integration projects and enterprise version is very affordable. Datastage is terrifically well-suited for big data volume tasks and parallel processing, but is quite an overkill in small projects.

It’s interesting that although I did quite a bit of DWH model design I written have just a few posts on this topic. But every time I think about writing out some advice — I think that the best advice is to just go read the books. And if you still have questions — reread them ) I’m reread Kimball’s books a few times already and every time gives you an “ah, that’s what they meant” moment based on your recent experience.

Anyhow, my last couple major DWH projects were for government agencies and I packed a number of simple but effective modeling tips exactly for them. Hopefully I’ll write them out in nearest future. Just need a free weekend or two.

Cognos Express and BI free trials

I must be the last one to notice, but you can actually try out Cognos 10.1 BI and Cognos Express for free (time limited trial). That’s very convenient and is totally a major shift from usual ‘we have products soooo good that you have to ask partners to show them’ policy.

‘Open download and use but paid support’ policy advocated by Oracle (for example) can actually attract more people to work with the products and to buy them eventually, even when they are already at a different company. It really worked well on russian DBMS market, where Oracle is the only ‘big enterprise’ choice.

On the other hand, both BI and Express are quite complex products, so mere effort of trying to make them work in non-obvious tasks can lead to utmost frustration. But both of them are packed with demos and samples, so that should make life a bit easier.

All in all, I’m really glad this is happening and am rooting for time-unlimited trials. You won’t build a serious system without fix packs anyway )

Do you want your TM1 go twice faster on Intel-box? Turn HyperThreading off

This’ll be a bit long (but with a hidden bonus for attentive reader), so I’ll start from conclusions. If you’re using a recent server with Intel CPUs, you’d better check whether they have Hyper-Threading (HT) and try turning it off to gain 2x speed boost.

How to do it:

1) go to server, open command line, type systeminfo (processor info will be in first ten lines). Or if you don’t like it cool, open Control Panel, CPUs are described on second pane )

2) check whether your CPU’s are in this list

3) ask you server admin, or manually reboot server, look into BIOS and turn HyperThreading off

4) Test your TM1 processes, they may speed up considerably

 

What’s behind the scenes.

Hyper-threading, in a nutshell, is a technique showing each physical CPU core as 2 logical ones. Since during normal work a lot of time is spent in thread switching and related registry load/unload and yada yada, physical cores actually are underloaded in common multithreaded system. Adding a duplicate registry set and a ‘virtual’ core allows to utilise physical core up to 20-40% more, earning throughput benefits.

Key words here are “threading” and “throughput”, meaning that HT benefit multithreaded applications where lots of small request are processed at the same time. OLTP databases are a prominent example.

But most of the systems I work with, almost all OLAP engines and even DWH-tuned DBMSes, actually suffer in this scenario.

For example, TM1 calculations are executed in a single thread, so if this thread is assigned half an CPU core, speed drops significantly. The same logic goes for Essbase metadata update process, for example, or for Cognos Enterprise Planning Job processing.

I’ve seen significant performance degradation (30%) in Cognos EP job processing due to HT context switching and have advised turning HT off on all Cognos EP Job servers.

Encountering the same issue with TM1 recently left me puzzled, since there was plenty of empty CPUs on server and there shouldn’t be any thread switching. But we’ve got 2+ time speedup after flickering the switch anyhow )

 

I promised a bonus and here it goes:

When buying a TM1 server with Intel CPUs, buy way more cores than you’ll license. Physical cores, not logical.

This will turn on Intel Turbo-Boost and it might give you another 30-40% speed boost. I haven’t tried that yet and there’s a bit of scepticism of current boost-detection approach in Wikipedia, but it’s definitely worth testing in a lab before you buy hardware. Cores are cheap compared to licence costs, so you may save yourself a ton of money by having a faster system. As in wikipedia example, for Core i7-2920XM you can boost up to 3,5 Ghz per core with 2,5 Ghz base speed. That’s a hefty speed up for a good old one-threaded TM1.

Cognos Enterprise Planning PAD contents, duplicate entries and deployment problems

Seems like I can rename this blog ‘Reminisces’ any day now. After some years of not prodding into Cognos EP I was asked to look at a rather confusing error. Export deployment wouldn’t work, throwing ‘null’ error initialising object selection dialogue. Yes, as these two IBM support articles (1,2) state: it’s a problem with duplicated objects in PAD. But since was the first time I saw this rather massive instance (80+ apps, hundreds of macros), I was interested in finding a way to automate this task. Although number of EP users is decreasing, hope that anybody facing this issue in future will benefit from this post. And if you’re curious about how things are stored in PAD, scroll over )

 

There are a few things that can be duplicated in PAD, so I’ll describe each situation accompanied with code I wrote to check for duplicated items. Fortunately, export dialogue shows what it’s checking at failure point, so you’ll know where to look.

Table of contents

Macros

All you need to know about macros is store in P_MACRO* tables. P_MACRO holds general macro info, including it’s name that is potential duplicate. EP developers must’ve overlooked the fact that macro name is case-sensitive from Administration Console point of view and case-insensitive from export deployment wizard’s point of view. So if you have two macros: copy_sales and copy_Sales, you’ll find out about this duplication only when you’ll try to export deployment.

Code checking for such situations is very straight-forward.

SELECT * FROM p_macro WHERE
 LOWER(rtrim(ltrim(macroname))) IN
 (
 SELECT LOWER(rtrim(ltrim(macroname))) FROM P_MACRO
 GROUP BY LOWER(rtrim(ltrim(macroname)))
 HAVING COUNT (*) >1
 )
 ORDER BY LOWER(rtrim(ltrim(macroname)))

Found some duplicated macros this way, but the problem persisted.

Macros are also recorded in PAD xml (see below), but table approach is way easier.

Applications

Applications pose a more interesting problem.

ADMINOPTION table

Most of the application details are stored in ADMINOPTION table of each applications scheme (Oracle) or database (Ms SQL Server). I wrote some time ago about updating JVM options for Contibutor client via this table. In my search for duplicates I was interested whether there are duplicate application ids or display names. So it made sense to query all adminoption tables at once.

Here’s a sample Oracle SQL script for such queries. It’s easy to replicate it for MS SQL server (seems that didn’t publish such examples as of yet, mail me if you need them).

CREATE TABLE temp
(USR_NAME VARCHAR(50) NULL,
OPTIONID VARCHAR(250) NULL,
DESCRIPTION VARCHAR(512) NULL,
OPTIONVALUE VARCHAR(512) NULL);
commit;
BEGIN
  FOR usr IN (SELECT owner FROM all_tables
  WHERE TABLE_NAME = 'ADMINOPTION')
  LOOP
  EXECUTE IMMEDIATE 'INSERT INTO temp (usr_name, optionid, description, optionvalue)
  select '''|| (usr.owner) ||''' USR_NAME, OPTIONID, DESCRIPTION, OPTIONVALUE from ' || (usr.owner) ||'.ADMINOPTION' ;
  END LOOP;
END;
 
-- check for duplicate application_display_name
SELECT optionvalue, COUNT(*) FROM temp
WHERE optionid = 'APP_DISPLAY_NAME'
GROUP BY optionvalue
HAVING COUNT (*) > 1

Strangely, such exercise showed me some duplicated names, but detaching those apps from PAD didn’t help, Export wizard was still complaining about duplicated apps.
So I started SQL profiling to see what was really happening under the hood and was impressed by finding

PAD xml

Turns out, all application details are stored in an XML file located in P_PAD table. This file is actually huge and messy, but it contains some fields that are not available in ADMINOPTION table or Administration console output. Deep inside this file, applications are described like:

<applications>
<application id="{94BF5F99-E6AF-4D9C-BA63-E2303055296B}" name="go_capex_contributor" 
objidref="obj_APPLICATION"><properties><property propidref="pr_DATASTORE_CON_DOC">
<![CDATA[<connection>   
...
  <applicationname>    go_capex_contributor   </applicationname>  
</connection>  ]]>
</property><property propidref="pr_DISPLAY_NAME"><![CDATA[go_capex_contributor]]></property>
<property propidref="pr_UNIQUE_ALIAS"><![CDATA[test]]></property>
...
</applications>

And it turns out that the ‘name’ property held duplicates in my case. Since manually extracting properties is tad too tedious for me, I wrote a simple python script that does exactly that. It’s not a complete solution, more an approach sample. I’m using pyparsing as usual. Continue Reading »

A cow jumped over the earth

Big news: we’ve relocated to Sydney, Australia (yep, the one with the Opera house, kangaroos and Fosters).

And as of 1st of November I’ll be joining small, but proud ”bunch of guys that share the same passion and dedication for Cognos (and sometimes the same sick humour :))” ©, namely, PMSquare.

There’ll be more upside-down stories here, I promise.

 

As they say it here, “Cheers to you all” ;)

Cognos BI SDK revisited

After, oh my god, 4 years, I’ve spent a couple of days typing Cognos 10 SDK Java code. What changed in BI automating during this period?

Well, the biggest change was trivial: I’ve forgot everything ;)

But that turned out to be good news, since version 8.3 to version 10 changes are huge. I remember reading SDK Developer Guide overnight for 8.3, but now it’s 3,500+ pages long! As it turns out, you can still read all the needed stuff in a short time period, but understanding what to read is a challenge.

So it’s best to start with this proven practice article, providing some kind of helicopter view of what can be achieved by what parts of SDK.

Also take a look on this 4 part blog series by Peter Beck of BI Professional site.

From there, I suggest you dive into sdk\samples folder and look through the code provided there. However complicated it may look, it’s actually way more integrated than any Knowledge Base\Technote code sample you can download, since extending those turns into rewriting the built-in samples functionality. Trust me on this, I’ve spent half a day doing exactly that before giving up and building upon existing samples.

Where to look when you’re stuck trying to understand a code piece or function? I used javadoc generated documentation instead of Developer Guide, because it’s Java-specific. I used Developer Guide just like an overall Class&Methods reference.

As for packing samples content: minimum required set for content store manipulation (add\remove group, members etc) is (all paths are from sdk\java\):
Common\CRNConnect.java — connecting to Cognos Services
Secuity\Logon.java — you’re not using Anonymous  logon, do you?
HandlerCS\CSHandler.java — all Content store manipulation wrap-ups

Some more SDK sample links:

Cognoise SDK for Cognos 8 forum

Cognoise SDK for Cognos 10 forum

IBM SDK-related articles and Technotes

 

PS: It’s funny, I’ve wrote both TM1 Java API code and BI 10 SDK in last three months ) Can’t think about any combined methods that’d be useful. Any ideas?

Using session variables in Cognos BI

Just a quick Cognos BI hint: you can use session variables to store project-level constant values.

I’m a big fan of ‘feature-rich’ ETL reports showing not only what dimension element mismatch between systems, but also allowing seamless editing of element mapping. This usually means drilling down from report into external application for dimension mapping. Parameters are usually passed via URL (easiest possible way).
After server name and, therefore, URL changed for a second time in current project, I’ve set up a project level ‘severname’ constant to avoid XML find\replace for each report.
It’s really easy:

1) Add a session variable to your Framework project, write your required constant value there. Like ‘servername’ = ‘awesomebi’

2) Use it in Report Studio, just type

#sq($variable_name)#

in expression editor. sq encloses string in single quotes.