Applied Dimensionality

Ways of calculating Running Count to import data into Multi-Line models

Posted at — Feb 1, 2008
Ways of calculating Running Count to import data into Multi-Line models

I’ve nicked the term “Multi-Line” model out from some cognos best practices presentations. Never known it was called that way )

Multi-Line is only way to go when you have a potentially huge dimension only a tiny bit of which should be available to end-user at a time. Like employee planning, whole dimension of 10k people, strictly less than a 100 in department. So you create a fake dimension of 1..100 and add an Employee name column, decreasing cube volume by 100. Access tables + cut-down might help, but it’s sometimes better to allow people pick up any employee,client, product given up that there won’t be more than fixed number of rows.

Main problem with this modeling technique arises when it’s time to import data in such cube. That data usually doesn’t have an 1..100 running count attached, so it’s your task to add it.

In this post I’ll sum up the ideas of how to calculate running count split by elist (that’s usual, ain’t it?).

As a example, I’ll use this simple table:

dept emp salary

And you have departments as an elist, so you have to number rows so that numbering will restart for each dept. Numbering requires an order so let’s alphabetical order of employee names.

So this is what we want to get:

dept emp salary running_count

SQL code for table:

with data_t as
(
select 'Finance' DEPT,'Pete' EMP, 100 SALARY from sys.dual
union all
select 'Finance','Ann', 200  from sys.dual
union all
select 'Finance','Jo', 300  from sys.dual
union all
select 'HR','Nick', 100  from sys.dual
union all
select 'HR','Sam', 200  from sys.dual
)
select * from data_t

So what are our choices?

0 Use a framework package as source and a running-count calculation in query subject

Only since Cognos 8.2, slower than some variants, but definitely most cognos-style.

Create a query subject and add an expression of running-count like

running-count ( [SALARY] for [DEPT] )

1 Add sub query to count this rows position within elist slice.

That’s “works everywhere, but terribly slow” solution. Easy to do, use it if you’re about to import a 100-1000 rows table and really don’t care about time it takes.

Sample code:

select DEPT,EMP,SALARY,
(
select count(*)
from data_t T2
where T2.DEPT = T.DEPT
AND T2.EMP <= T.EMP
) as running_count
 from data_t T
 order by dept,running_count

2 Use SQL'99 standard rownum() function

Way much better optimized (in Oracle ) ) than subquerying variant above. Problem is: works only on Oracle, Ms SQL 2005+ and DB2. Ms SQL 2000 import – sorry, no SQL'99 there. See original syntax

Sample code:

select DEPT,EMP,SALARY, row_number()
over (partition by dept order by emp) from data_t

3 Use cursors to calculate rownumber**

Brute-force solution, non portable, as fast as possible. Only way to do serious lifting in MS SQL 2000.

I’ll place sample code for Ms SQL 2000 if somebody needs it :)

PS You’re on Oracle 10g+? Lucky you, you can also try to do running_counts it via oracle model by clause.

comments powered by Disqus