Archiving Planning Analytics data

Posted at — Jan 6, 2024

Had another TM1 / PA archiving project recently, so a bit of a write-up of how I suggest doing these :)

Why archive PA data?

A common TM1 practice is add a ‘snapshotting’ / ‘copy’ / ‘publishing’ functionality that allows you to copy the current working data set to another version to be able to refer to it later on. Same logic as copying your working spreadsheet to ‘latest_v3_final’ one :)

This copy is usually done by adding new elements into a ‘version’ dimension and copying data from ‘current’ to the newly created element. Ideally the copy elements will not have rule calculations for performance & storage considerations.

Once you have such ‘copy version’ button in place, and few years go by, you’d looking at a largish model that keeps growing in memory size all the time. Next version of PA will start charging directly by memory size, but even in the V11 you need to might run into expanding host server memory and will experience increasing server startup and backup times.

You can either:

Delete unused versions – a highly exciting conversation to have with users on a regular basis
Don’t allow adding new version and pregenerate a set number of versions that have to be reused once the limit is reached – this is what I always try to do in my projects, makes for a quite difficult set of conversations on the design phase on how many versions is ‘enough’. Sometimes I return to these systems later on to find out that somebody did the ‘amazing enhancement’ in the meantime and allowed creating new versions and now it’s ‘a bit of help needed with memory size and performance’ :)
Export data from snapshot versions to an external storage and remove their data from TM1 – this is what this post is about

Approach to archiving PA data

Overall we want to have an ‘archive’ functionality that will:

Export all cube data (both numeric and string) to files
Optional steps for moving files to durable storage:
1. Compress the files (highly recommended)
2. Move them somewhere for long term storage, most commonly cloud storage providers (Amazon S3, Google Cloud Storage, Azure Storage), it’s quite cheap for TM1 data volumes with a very high durability guarantees and all the good things lifecycle rules of moving older files to cheaper storage tiers. But it can be your on-prem file share as well :)
Clear out the exported data from from all cubes

And, obviously, a ‘restore’ functionality that will do the same steps in reverse:

Retrieve the files from storage (and uncompress)
Clear out the version you’re restoring from all cubes - just in case somebody wrote something to this version after it was archived
Import cube data (both numeric and string) from files

Notes on implementing such solutions

Ideally you’d want to run export, import and clear steps in parallel to speed things up. I prefer to split the execution by cube to make it simpler, but you can use dimension elements (i.e. export months in parallel) as well, just need a common enough dimension.

I prefer running the overall process as a multi-commit chore (to avoid holding locks any longer than necessary) and running ChoreQuit at each step if something goes wrong to avoid removing data that is not securely stored. Checks that something goes wrong are – export or import failed (TI process error), compression / decompression failed (not seeing the files we expect), cloud storage operations failed (more on this below).

I had a bit of an epiphany around Import & Export processes on the most recent deployment of archiving solution. Epiphany might be a strong word, but I get very excited these days with simplest of ideas, particularly if I come up with them :) For years I’ve been using a fairly standard export / import process that is largely inspired by Bedrock’s copy process and all this time I was been struggling a bit with string elements. Strings can be input on consolidated elements, so you either have to run copy process with SkipConsolidations set to true and get all the consolidated numeric elements that will be ignored or have to run it twice with different filter elements. Rewriting it as two different export & import processes for numbers and strings separately makes it a lot simpler to reason about from my point of view (strings will always include consolidated elements) as you can loop through the last dimension of the cube to filter the numeric & string elements. All this can be done with ‘on top’ of the normal copy process to generate filter strings, but using separate processes feels more straight-forward.

Lastly, if you’re using cloud based storage, I’d recommend running a list command after the upload (or download) to verify that you’ve received / sent all the files you’re expecting, just as another precaution.

Applied Dimensionality

Archiving Planning Analytics data

Why archive PA data?

Approach to archiving PA data

Notes on implementing such solutions