It’s a virtual world, part 1

My working process these includes tossing around lots of virtual machines, so I’ve gathered a couple of tricks. Let’s talk about getting your back better (this post) and lightening your CapEx (next one).

 Getting rid of client-secure laptops

Remote access to client environment sometimes comes with a physical burden of a laptop that is “secured” to access VPNs. Carrying it around alongside your main machine can be quite frustrating (and bad for your stature!).

You can try setting it up with 2 networks, leaving it in office and remotely connecting to it, but that causes all sort of maintenance issues. I still remember how one of my colleagues tried to find somebody in the  middle of the night office to wake up a machine that went to sleep.

There’s a simple solution — convert this laptop to a virtual machine. VMWare converter will create a virtual machine out of a physical one in an hour and it’s free to use. You’d need admin rights for the process, but I’m sure you’ll find some excuse to temporarily elevate permissions ) I’m dutifully carrying a couple of converted laptops VMs, ensuring very fast support response time.  And my back is fine, thank you.

TM1MN

It’s time to open-source / make available a utility I wrote more than a year ago. I originally intended to develop it  more whilst doing more TM1 projects, but my focus is mostly BI / DM these days, so I might as well just let it out and make changes by requests, if any. Enter TM1MN (hoped to make it TM1 Metadata Navigator, nice acronym, imho, sigh).

Current version allows you to get such an Excel table from your TM1 model:

SourceType Source Target TargetType LinkType Link Details
pro plan_load_budget to_report_cube cub plan_Report pro CellPut
pro plan_load_budget to_report_cube dim plan_source pro SubsetCreate for subset Vplan_source_SSN SubsetElementInsert for subset Vplan_source_SSNSubsetDestroy for subset Vplan_source_SSN
cub plan ExchangeRate cub plan BudgetPlan DB DB line 11,15,19,23,27,31
cub plan ExchangeRate cub plan_Report DB DB line 5,9,13,17,21, 40,44,48,52,56
cub plan ExchangeRate cub plan BudgetPlan LineItem DB DB line 6,11,16,21,26
pro plan set_language pro plan set_language pro Calculated ATTR for Attribute vFromAttr Calculated AttrPut attribute vAttr
dim plan business_unit cub plan BudgetPlan Attr Attribute currency from stmt line 11, 15, 19, 23, 27, 31

And  (with additional hassle) a diagram like this:

How to use it:

1) Download and unpack executables (source code is there as well)
2) Run tm1mn.exe -i path_to_your_TM1_server -l log_file (or it leave blank to dump everything in console)
3) Open graph_contents.csv file in Excel
4) It also generates a dot graph representation of the model, so you can download graphviz and run dot graph_to_draw > graph.svg to get a scalable A3 picture of your model

 

Behind the scenes:

TM1, in contrast with Enterprise Planning, is very-very open, rules and TI are text files you can just read with any script. I really missed dependancy analysis (what rules / TI affect this cube and how) or even visualisation from EP Manager. So I started writing a script that will work through rule and process file and find “connections” between cubes, dimensions and data sources.

Most reasonable folks (as the guys at bihints) would’ve gone with simple “search for DB’s and ATTRS”, but it’s quite limited approach, since TM1 is pretty rich in syntax (you can have nested IF statements in DB target, for example). I thought that there should be a better way. Being damaged by CS education, I thought about writing a decent TM1 syntax parser (LR, for anybody who cares).

Easier said than done, but I’ve got a working pyparsing-based TM1 rules and processes parser. Originally I thought about doing quite a bit of automated model analysis (search for calculation cycles, search for missing/unnecessary feeder statements), so parsing results are recorded as a NetworkX oriented multi-graph, providing built-in graph operations (find shortest path, neighbours, connections).

Moreover, any NetworkX graph can be exported into DOT format, so it’s possible to use graphviz to generate nice model pictures (as here).

All that should’ve been tied to some sort of GUI to allow filtering cubes, interactive diagrams creation and yadi-yadi, but that’s the point where I got to my limits.  Can’t really do GUIs, as it turns out.

So I’m just dumping all graph contents into a csv file, so you can use Excel’s Autofilter magic for dependancy analysis. I’m routinely using this for any TM1 model analysis (like finding an overfed cube with Perfomance Monitoring and looking and what rules feed it).

Updates

Update 1 (version 0.11): uploaded a corrected ) version of the script, with proper error handling.
Update 2 (version 0.12): newer version, with better ASCIIOutput handling and TI variables names handling.
Update 3 (version 0.13): Little more bugfixing and clean-up.
Update 4 (version 0.14, 26/11/2012): Even more bugfixing and proper quoting for dot file edge labels.

Cognos BI logging setup

Continuing our weekly, monthly, quarterly instalments, let’s talk about Cognos log files management.

I’m really puzzled by the fact that whenever I walk in on a clustered Cognos install, error logging is always set up just wrong!  Meaning not the way I like it :)

Cogserver.log file is your biggest ally in debugging anything happening in the environment, but do you really want to look around on 3-4-5 log files on every server?

1) Set up local log file on one server in a cluster (I use one of dispatchers for this) and point all the others to this one as a remote log server. Bingo, errors/warnings from all servers will fall to a single file. And that file suddenly gets a bit bigger and harder to digest.

 

 

 

 

 

 

 

 

 

 

 

2) Go to cognos_install\bin\ folder and run logView utility (logViewV2 in Cognos 10, logconsole in 8). It’s a very nifty tool that gives you ability to add filters statements to search for specific error messages, filter time ranges and components:

 

 

 

 

 

 

 

BTW, don’t turn off local logs on other servers, they might prove useful if that single logging dispatcher fails.

Cognos Transformer multi-server cube building revisited

I was decribing the 4-year old trick of using Cognos server cluster to build powercubes and got a new idea.

Really, PowerPlay cubes building via scripts is a so 1990s, 21st century is for enterprise scale, failover, monitoring and you-know-the-pitch.

If we make Transformer cube build a Report/Job in Cognos Connection, we get:

  • load-balanced cluster builds (cube building task will start on any avalaible server, we need to ensure server will be able to run the build script)
  • monitoring (the same Cognos System \ Schedule pane in Administration) and build history
  • event-management (builds will be recorded in audit tables and we can do notifications, restarts and scheduling in Event Studio)

The question is how to make cube build a Job / Report run with minimum of hassle. I was originally thinking of a simple ASP page, running a cube build batch script, used as a web-service datasource with cubename as parameter, but (that’s the a-ha moment) you can use a simple Java application, running in the same application server as Cognos itself. That’ll give platform independance and nice url (ibmcognos/buildcube.jsp?cubename=…).

Moreover, you can package cube build scripts into the application itself, make them configurable (model location, backup location, number of copies to retain, etc), make the application detect Cognos version (to know how to swap cubes) and detect operating system (what scripts to use) and you’ll have all the cube build logic packed into a neat single file container that you just copy accros all servers in cluster. That’ll take away all script maintenance burden and give all the benefits of centralized enterprise way. And it’s not a big development effort, I’d estimate it a couple of weeks spent with coding hat on.

Strangely that neither of big “all-things Cognos” shops (BSP or Motio) didn’t do something like that, there’s still a huge lot of PowerPlay shops around that’d love something like this. And some of them are going from 8 to 10 right now and that requires rewriting parts of cube-building scripts ;)

TM1 MDX querying snippet

I’m playing around with some TM1 ideas and all of them require some sort of sparsity analysis. And since TM1 supports MDX (to some extent, to say it mildly) and I like MDX and used it a lot with Essbase, I thought it’d be wonderful to use MDX.

And any post is better with a cat picture in it, don’t you agree? This one is about a casual TM1 developer embrasing the ocean of MDX possibilities.

But MDX-dialect of TM1 isn’t well-documented (again, being very polite), so learning it’s intricacies is mostly trial and error and an open field for experiments. And whilst you can write MDX-driven subset definitions (wonderfully described by Phillip Bichard and Martin Findon) in Architect, there’s no actual tool where you can write “select years on rows, measures on columns from sales” and see you table (all the Cognos studios do that, but in background and with quite inefficient way of writing you own). You can go wild on the ODBO page of a TI process as well, but that’s a bit hard from error handling and results viewing.

 

So at the end of a day I wrote a small Java based script that allows you to write MDX queries to your TM1 server and get result back as a csv file (there’s an Excel workbook doing the same thing, but I’m a Java guy and I needed the querying code to conquer the universe anyway).

 

How does it work?

1) You fill in a query you want to fire in an xml file with some additional options (admin server, login/password, file to write query results to, will you need headers and all that)

2) Run java tm1_mdx your_query_file.xml and get either an incorrect MDX syntax error (bad, I get this often) or results in a CSV file (good)

 

As usual, there’s source code in the package, so you can adjust it \ use it as a base for something bigger.

How to install it?

  1. You will need  Java runtime engine (jre). This the only requirement, there’s no need to have TM1 installed or anything.
  2. Download the tool (this package is for 9.5.2, ping me if you want it for some newer\older version, it’s just a matter of recompiling), unpack to some directory (c:\tm1mdx, for example)
  3. Cryptic step of adding security certificate to your Java.
    1. Find jre\bin\keytool.exe utility on your computer
    2. Open the command line and navigate to jre\bin
    3. Run command line keytool -keystore ..\lib\security\cacerts -alias Company -import -file c:\tm1mdx\applixca.der
    4. Enter ‘changeit’ as password
    5. Type yes
  4. Fill in the mdx_query.xml
  5. Open command line, navigate to c:\tm1mdx, run java tm1_mdx mdx_query.xml

 

PS: By the way, I glimpsed into TM1 10.1 API docs and all the APIs were greatly enhanced in this version (mostly for Performance Modeler, it seems), so you finally have a per line grained access to rule files and such. It’s obvious that Performance Modeler uses Java API (it’s using Cognos Rich Client Platform, it’s inevitable),  so it’s extensively used and developed and might even get documented at some point )

PSPS: It’s interesting whether there’ll be any IBM-made XML/A driver for TM1 at some stage. It seems possible to do a  straw olap4j driver for TM1 even now and that’d be a funny way of opening it up a bit further.

Cognos Content Store Security Overview in SQL

Ever found yourself wondering what users are really there in your BI Consumers or any other group? Sure, you’ve got a few of them explicitly listed there (bad practice) but mostly you’ve got just a list of other groups like “Western Division”, “Marketing” and all those (good practice). And they’re, most likely, nested into each other a few times (like Western Division OpEx and Marketing Asia).  Or, looking at it from totally other point of view — what groups Jack Smith is really in (again, with all these nesting ones)? Is he a member of Western Division or what?

 

Built-in Cognos Security administration is quite awkward (compared to so long gone Access Manager, sigh), so questions like above are so frequent. There are 3 main solutions:

1) Buy an externally developed tool from BSP or Motio. I had a chance to look at BSP MetaManager recently and it does a whole lot of stuff to make your administration life easy. Bit of an overkill to just ask “who’s there”, but totally worth it if you’re going big scale \ long term.

2) Buy an SDK license and use the sample script from IBM as a base and extend it just for this task and then use SDK left and right to do anything you want in Cognos. If you have an SDK license through some global license deal — the sample script maybe just what you need to get the job done.

3) Write SQL queries to Content Store database to return just this info. It’s an unsupported (I’ll even stress this) unsupported way of doing it, but if you just want to have a quick look or are happy to loose it at some upgrade point (although it works on both 8.4 and 10.1, CS database schema isn’t changing that much), here are the scripts for both MS SQL Server and Oracle.

 

I used the script from this SQL.RU topic as a base and added group nesting unwrapping with Connect By in Oracle and CTE’s in SQL Server. You can turn this SQL into FM query subject and have a nifty report.

 

 

Continue Reading »

New recruit to my ETL toolbox

I’ve recently completed my first real DataStage project and took a chance to get certified while all the stuff is still “fresh”. Certification itself is quite complex and I didn’t use most of the tricks depicted in questions up until the moment when one of the jobs had to process a quarter billion of rows in a reasonable timeframe. From that point on I learned quite a lot about partitioning, balancing, debugging and choosing right stages to do the job (who would’ve thought that RemoveDuplicates is waaaaay slower than Sort (with Remove Duplicates option) — why put RD stage in at all?) Anyhow, now I’m also an IBM Certified Solution Developer — Infoshere DataStage v8.5 )

So my current ETL tools breakdown goes smth like (not counting PoC and likes):

  • Oracle Data Integrator — 3 projects
  • Pentaho Data Integrator — 2 projects
  • IBM InfoSphere DataStage — 1 project

And that’s my current preference list as well. I love ODI’s flexibility (it’s actually very simple once you get it how it works and it’s extremely configurable), ELT approach (I’d rather be tuning my DBMS than DBMS and a separate ETL engine). PDI is very open and quite user-friendly (compared to DS, for example) and it’s easier to understand & debug than ODI. PDI community edition is enough for most small data sized integration projects and enterprise version is very affordable. Datastage is terrifically well-suited for big data volume tasks and parallel processing, but is quite an overkill in small projects.

It’s interesting that although I did quite a bit of DWH model design I written have just a few posts on this topic. But every time I think about writing out some advice — I think that the best advice is to just go read the books. And if you still have questions — reread them ) I’m reread Kimball’s books a few times already and every time gives you an “ah, that’s what they meant” moment based on your recent experience.

Anyhow, my last couple major DWH projects were for government agencies and I packed a number of simple but effective modeling tips exactly for them. Hopefully I’ll write them out in nearest future. Just need a free weekend or two.

Cognos Express and BI free trials

I must be the last one to notice, but you can actually try out Cognos 10.1 BI and Cognos Express for free (time limited trial). That’s very convenient and is totally a major shift from usual ‘we have products soooo good that you have to ask partners to show them’ policy.

‘Open download and use but paid support’ policy advocated by Oracle (for example) can actually attract more people to work with the products and to buy them eventually, even when they are already at a different company. It really worked well on russian DBMS market, where Oracle is the only ‘big enterprise’ choice.

On the other hand, both BI and Express are quite complex products, so mere effort of trying to make them work in non-obvious tasks can lead to utmost frustration. But both of them are packed with demos and samples, so that should make life a bit easier.

All in all, I’m really glad this is happening and am rooting for time-unlimited trials. You won’t build a serious system without fix packs anyway )

Do you want your TM1 go twice faster on Intel-box? Turn HyperThreading off

This’ll be a bit long (but with a hidden bonus for attentive reader), so I’ll start from conclusions. If you’re using a recent server with Intel CPUs, you’d better check whether they have Hyper-Threading (HT) and try turning it off to gain 2x speed boost.

How to do it:

1) go to server, open command line, type systeminfo (processor info will be in first ten lines). Or if you don’t like it cool, open Control Panel, CPUs are described on second pane )

2) check whether your CPU’s are in this list

3) ask you server admin, or manually reboot server, look into BIOS and turn HyperThreading off

4) Test your TM1 processes, they may speed up considerably

 

What’s behind the scenes.

Hyper-threading, in a nutshell, is a technique showing each physical CPU core as 2 logical ones. Since during normal work a lot of time is spent in thread switching and related registry load/unload and yada yada, physical cores actually are underloaded in common multithreaded system. Adding a duplicate registry set and a ‘virtual’ core allows to utilise physical core up to 20-40% more, earning throughput benefits.

Key words here are “threading” and “throughput”, meaning that HT benefit multithreaded applications where lots of small request are processed at the same time. OLTP databases are a prominent example.

But most of the systems I work with, almost all OLAP engines and even DWH-tuned DBMSes, actually suffer in this scenario.

For example, TM1 calculations are executed in a single thread, so if this thread is assigned half an CPU core, speed drops significantly. The same logic goes for Essbase metadata update process, for example, or for Cognos Enterprise Planning Job processing.

I’ve seen significant performance degradation (30%) in Cognos EP job processing due to HT context switching and have advised turning HT off on all Cognos EP Job servers.

Encountering the same issue with TM1 recently left me puzzled, since there was plenty of empty CPUs on server and there shouldn’t be any thread switching. But we’ve got 2+ time speedup after flickering the switch anyhow )

 

I promised a bonus and here it goes:

When buying a TM1 server with Intel CPUs, buy way more cores than you’ll license. Physical cores, not logical.

This will turn on Intel Turbo-Boost and it might give you another 30-40% speed boost. I haven’t tried that yet and there’s a bit of scepticism of current boost-detection approach in Wikipedia, but it’s definitely worth testing in a lab before you buy hardware. Cores are cheap compared to licence costs, so you may save yourself a ton of money by having a faster system. As in wikipedia example, for Core i7-2920XM you can boost up to 3,5 Ghz per core with 2,5 Ghz base speed. That’s a hefty speed up for a good old one-threaded TM1.

Update 26102012:
IBM official position on hyperthreading for TM1, thanks to James Wakefield for pointing it out.

Cognos Enterprise Planning PAD contents, duplicate entries and deployment problems

Seems like I can rename this blog ‘Reminisces’ any day now. After some years of not prodding into Cognos EP I was asked to look at a rather confusing error. Export deployment wouldn’t work, throwing ‘null’ error initialising object selection dialogue. Yes, as these two IBM support articles (1,2) state: it’s a problem with duplicated objects in PAD. But since was the first time I saw this rather massive instance (80+ apps, hundreds of macros), I was interested in finding a way to automate this task. Although number of EP users is decreasing, hope that anybody facing this issue in future will benefit from this post. And if you’re curious about how things are stored in PAD, scroll over )

 

There are a few things that can be duplicated in PAD, so I’ll describe each situation accompanied with code I wrote to check for duplicated items. Fortunately, export dialogue shows what it’s checking at failure point, so you’ll know where to look.

Macros

All you need to know about macros is store in P_MACRO* tables. P_MACRO holds general macro info, including it’s name that is potential duplicate. EP developers must’ve overlooked the fact that macro name is case-sensitive from Administration Console point of view and case-insensitive from export deployment wizard’s point of view. So if you have two macros: copy_sales and copy_Sales, you’ll find out about this duplication only when you’ll try to export deployment.

Code checking for such situations is very straight-forward.

SELECT * FROM p_macro WHERE
 LOWER(rtrim(ltrim(macroname))) IN
 (
 SELECT LOWER(rtrim(ltrim(macroname))) FROM P_MACRO
 GROUP BY LOWER(rtrim(ltrim(macroname)))
 HAVING COUNT (*) >1
 )
 ORDER BY LOWER(rtrim(ltrim(macroname)))

Found some duplicated macros this way, but the problem persisted.

Macros are also recorded in PAD xml (see below), but table approach is way easier.

Applications

Applications pose a more interesting problem.

ADMINOPTION table

Most of the application details are stored in ADMINOPTION table of each applications scheme (Oracle) or database (Ms SQL Server). I wrote some time ago about updating JVM options for Contibutor client via this table. In my search for duplicates I was interested whether there are duplicate application ids or display names. So it made sense to query all adminoption tables at once.

Here’s a sample Oracle SQL script for such queries. It’s easy to replicate it for MS SQL server (seems that didn’t publish such examples as of yet, mail me if you need them).

CREATE TABLE temp
(USR_NAME VARCHAR(50) NULL,
OPTIONID VARCHAR(250) NULL,
DESCRIPTION VARCHAR(512) NULL,
OPTIONVALUE VARCHAR(512) NULL);
commit;
BEGIN
  FOR usr IN (SELECT owner FROM all_tables
  WHERE TABLE_NAME = 'ADMINOPTION')
  LOOP
  EXECUTE IMMEDIATE 'INSERT INTO temp (usr_name, optionid, description, optionvalue)
  select '''|| (usr.owner) ||''' USR_NAME, OPTIONID, DESCRIPTION, OPTIONVALUE from ' || (usr.owner) ||'.ADMINOPTION' ;
  END LOOP;
END;
 
-- check for duplicate application_display_name
SELECT optionvalue, COUNT(*) FROM temp
WHERE optionid = 'APP_DISPLAY_NAME'
GROUP BY optionvalue
HAVING COUNT (*) > 1

Strangely, such exercise showed me some duplicated names, but detaching those apps from PAD didn’t help, Export wizard was still complaining about duplicated apps.
So I started SQL profiling to see what was really happening under the hood and was impressed by finding

PAD xml

Turns out, all application details are stored in an XML file located in P_PAD table. This file is actually huge and messy, but it contains some fields that are not available in ADMINOPTION table or Administration console output. Deep inside this file, applications are described like:

<applications>
<application id="{94BF5F99-E6AF-4D9C-BA63-E2303055296B}" name="go_capex_contributor" 
objidref="obj_APPLICATION"><properties><property propidref="pr_DATASTORE_CON_DOC">
<![CDATA[<connection>   
...
  <applicationname>    go_capex_contributor   </applicationname>  
</connection>  ]]>
</property><property propidref="pr_DISPLAY_NAME"><![CDATA[go_capex_contributor]]></property>
<property propidref="pr_UNIQUE_ALIAS"><![CDATA[test]]></property>
...
</applications>

And it turns out that the ‘name’ property held duplicates in my case. Since manually extracting properties is tad too tedious for me, I wrote a simple python script that does exactly that. It’s not a complete solution, more an approach sample. I’m using pyparsing as usual. Continue Reading »