Applied Dimensionality

How to implement a work queue in TM1

Posted at — Jan 29, 2018
How to implement a work queue in TM1

TM1 / Planning Analytics team is doing an impressive amount of work to make TIs run in parallel, however, the ‘grunt’ part of TurboIntegrator processes is still (as of 10.2.2 / PA 2) limited by one core. Even if you have a beefy server with multiple cores, they would only be utilised in ‘preparing source view with MTQ’ stage and all Metadata / Data will be single threaded. A common way to speed things up is to split the work into independent chunks and run those in parallel via tm1runti. Sometimes you can get close to the linear increase in performance improvement and it can be worth a try.

Just before you embark on this ‘parallelisation’ journey, check whether you can speed things up using more ‘standard’ mechanisms (i.e. read aggregated data, make sure that your code is optimal, etc). My rule of thumb is that a well-done TI processes ~10k cells a second a single core. If you’re in this ballpark, say allocating 100k target cells in a minute, you’re good. If you’re processing 100k target cells in a few hours – there’s something wrong in what you’re doing and there are greater gains in reviewing your logic before throwing more hardware at it.

There are a couple main challenges to run things in parallel:

Data partitioning

You need to select subsets of data that can be processed independently in parallel. This depends on what you’re doing, but it is usually possible to find at least one dimension that can be used for splitting the work. Time dimension is sometimes an easy candidate for this.

Processing in parallel and monitoring the results

This is something that is quite awkward to build in TM1, as there are no built-in thread control mechanisms and it’s exactly what this whole post is about :)

Let’s say we would be splitting a large allocation TI into multiple chunks (by month) and running them in parallel.

We would create 2 TIs:

  1. Worker process, for example, Allocate Data by Month (pMonth as a parameter, statusFile as a parameter):

this process will do all the heavy lifting of allocating data, but source view for it will be limited by a single month that we pass as a parameter

  1. Control process, for example, Allocate Data

this process will check how many months we want to run in parallel (I like to make this a system parameter) and will run tm1runti calls to Allocate Data by Month with required months. Running tm1runti is an asynchronous call ( ExecuteCommand with 0 Wait parameter), so we need to know when each allocation process is complete to launch the next one.

I use files to monitor execution of Allocate Data by Month worker processes by making sure that worker process will write up a status flag file to a temporary folder on startup and completion (something along the lines of Allocate\_Data\_July.Started, Allocate_Data.July.Complete)

Allocate Data control process then becomes a quite sophisticated while loop:

And that’s it, just a bunch of files in a folder + a quite fancy TI to ‘control’ execution and you can be splitting TIs left, right and centre.

For some extra points and cookies:

comments powered by Disqus