We all love tm1s.log files, our lifeline and an invaluable tool in the ‘why the number is X’ scenarios. They tend to get quite big though and you usually do some sort of ‘zip’em / move’em / delete’em after Y days’ automation. In this post, I’ll look into adding ‘push’em to the cloud to query’ step :)
Overall there 3 common ways of ‘querying’ the log files:
So while I was doing the unzip / search routine the other day and thought that there should be another way.
Enter Amazon Athena : a serverless query engine on Amazon Web Services. Athena (a Presto cluster in the background) provides you with a SQL interface to files stored in Amazon S3 buckets that works over compressed files as well.
So what you do is:
Let’s run some numbers.
Say we will be uploading 500Mb compressed log files daily (that’s around 15b uncompressed transaction log, a pretty decently loaded TM1 system) and would want to store a year worth of log files that you will query and move them to Glacier after that (a lot cheaper, you wouldn’t be able to query them unless you explicitly ‘unfreeze’ them):
So overall storage cost would be 6$ a month
Athena queries cost 5$ per Tb data scanned, so if you run 5 queries overall all of last month’s files (12 Gb per query), it’d be ~9 USD per month.
With 500Mb you shouldn’t be paying for data in / out, but let’s put aside another couple bucks for it.
Overall, for ~25$ per month, you’d get 2 years worth of 15 Gb daily tm1s logs stored safely with an SQL interface on top.
I would do something similar in BigQuery if you’re on Google Computing Cloud and in Azure Data Lake.
Sample table create for tm1s log files:
CREATE EXTERNAL TABLE `tm1s_logs`(
`transaction_id` string,
`change_date_1` string,
`change_date_2` string,
`client` string,
`data_type` string,
`value_before` string,
`value_after` string,
`cube_name` string,
`dim1_element` string,
`dim2_element` string,
`dim3_element` string,
`dim4_element` string,
`dim5_element` string,
`dim6_element` string,
`dim7_element` string,
`dim8_element` string,
`dim9_element` string,
`dim10_element` string,
`dim11_element` string,
`dim12_element` string,
`dim13_element` string,
`dim14_element` string,
`dim15_element` string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'separatorChar' = ',',
'quoteChar' = '"',
'escapeChar' = '\\'
)
STORED AS TEXTFILE
LOCATION
's3://yourbucketnamehere/folder/'
TBLPROPERTIES (
'has_encrypted_data'='false',
'transient_lastDdlTime'='1571790782')