I’ve been a long-term fan of oracle model by clause, talking about here and there, doing some small&funny examples (like eight queens solution in one select) but lacked some real-life necessity to use it. It always turned out that partition by was enough, or some tricky analytical function was already built-in (like trend calculations I wrote about).
And at last, I’ve encountered a suitable problem.
Task is quite simple&straightforward: calculate number of periods in which current stock will be sold in a store. Like: today we have sales enough for 3 weeks. That’s a very common sales demand analysis parameter (Months of Sales (MoS), Weeks of Sales (WoS), DoS and etc).
Formula is quite simple: Current Stock / Average Sales over some period
But there were some difficulties in my case.
I’ve got only partial data, some periods were completely missing, and the requirement was to treat missing sales data as zeros. And I also had to get the last known stock as current stock, because stock data could be missing too.
Last stock is pretty easy to get using last_value oracle function, but getting average sales is hard. You cannot use fixed window width in partition by, since data can be missing. Moreover period varied other the years, so I’ve ended up adding up a field in time dimension called periods_ago, that contained time_key reference from which to sum sales.
So I’ve started with simple subqueries
SELECT time_key, product, company, sales_qty, (SELECT SUM(sales_qty) FROM dwh.sales sales2 WHERE sales2.product = sales.product, sales2.company = sales.company, sales2.time_key BETWEEN weeks.periods_ago AND sales.time_key ) AS running_sales_sum FROM dwh.time_dim weeks, dwh.sales sales WHERE weeks.key = sales.time_key
and that is terribly slow on 10^6+ rows. ~20 minutes.
But with model by it’ll be like:
SELECT TIME, product, company, sales_qty, last_stock_qty FROM dwh.time_dim weeks, dwh.sales sales WHERE weeks.key = sales.time_key model IGNORE NAV PARTITION BY (product,company) dimension BY (time_key) measures ( periods_ago, sales_qty, 0 eight_weeks_sales_sum_qty, NVL(LAST_VALUE(STOCK IGNORE NULLS) OVER (PARTITION BY COMPANY, PRODUCT, SEGMENT, DISTRIB_LEVEL, SYSTEM ORDER BY TIME ASC),0) AS last_stock_qty rules UPDATE ( running_sales_sum[ANY] = SUM(sales_qty)[TIME BETWEEN periods_ago[cv()] AND cv(time_key)] END ) )
around 3 minutes ) Notice the last stock calculation.
Here’s a couple links to good model by examples:
A wonderful tutorial at SQLSnippets
Official documentation restriction section — read that carefully