Incremental Refresh of Intelligent Cubes
Arguably, the headliner feature of MicroStrategy 9 was the introduction of Intelligent Cubes. This powerful feature revolutionized project speed and design by allowing for lightning fast data access directly from memory. After implementation, it gives the developer the best of both ROLAP and MOLAP worlds without either of the draw backs. You get the speed of MOLAP data requests, but the flexibility of ROLAP ad-hoc analysis.
With each release following the 9 introduction, Cubes have gotten faster to load and smaller to store in memory, but two main drawbacks have existed:
1) Not all features are supported to be placed in Cubes, and even fewer features are supported for Dynamic Sourcing. This is getting better each release, but until support for Custom Groups and per pass Dynamic Sourcing is added, system wide adoption will still be a dream.
2) You have to fully repopulate the entire cube each time you want to update it. This makes some scenarios infeasible as the time to load the data would not fit into potential ETL windows.
Thankfully, MicroStrategy has solved the 2nd issue with the introduction of Incremental Refresh for Intelligent Cubes in the 9.2 release. Today, I’ll talk about this exciting new feature and how it works for you.
Intelligent Cubes vs Caches
An Intelligent Cube is basically a report whose result set is stored in memory, as well as a copy on the file system. After a reboot, that file copy is reloaded back in memory to prevent the need for reprocessing, and should you run out of memory, the cube that hasn’t been accessed in the longest amount of time is unloaded from memory, but the files still remain. The saved result set is different from a traditional cache file because it can be queried against by MicroStrategy.
If you run a report for North East Region, 2010 by Month, that single result is stored in a cache file. If you run that exact report again, then you get the result instantly. But if you change that query by requesting North East Region, 2010, March only, then the report has to run again, even though the cache file does contain a row for your answer.
This is where Intelligent Cubes come in. If you populated an Intelligent Cube with North East Region, 2010 by Month, then any subset query from that dataset can also be answered from it. This means that instead of pre-caching specific reports, you would build a more generic Intelligent Cube to fit a wider array of requests. In this example, you’d probably build a cube for All Regions, All Years, All Months. This miniature database would sit in memory, and any request using those objects would be returned instantly as if they were cached.
Another powerful difference is it’s use in conjunction with Security Filters. Say we both run the same NE, 2010 report, but I’ve got a security filter for NE Region only, and you don’t. Even though we’re requesting the exact same thing, our different security filters dictate that we can’t share a cache, resulting in the report running twice. With Intelligent Cubes, the security filter can be resolved against the cube.
Full Refreshing Intelligent Cubes
The traditional method of refreshing Intelligent Cubes is to run the cube (it’s a variation of a report, and runs in the same editor) and the results are populated to memory instead of displayed on the screen. You can also right click on the Intelligent Cube and choose Schedule Delivery To -> Refresh Cube. This allows you to choose a Schedule and have the cube refreshed automatically. Both of those steps will result in a full reload of the data into memory, so if you’ve made your cube as generic as possible to cover as many scenarios as possible, it could take awhile.
Incremental Refreshing Intelligent Cubes
With 9.2, there is now a second option for refreshing the cube. Right click the Intelligent Cube and select Defined Intelligent Cube Refresh Report.
You’re then provided with a new option dialog to define the method of refresh you’d like to take:
1) Update – Insert new rows from the report and replace any overlapping data.
This would be the most common method of Incremental Refresh. If you’ve got a cube with daily data for the last year, you may choose to schedule your Incremental Refresh to load the last 3 days of data each day. This way you get the new day, and potentially a little overlap in case there were ETL issues or source system changes you didn’t previously catch.
2) Insert – Only insert non-overlapping rows from the data.
This would be used if you were certain that you didn’t need to change data (or perhaps purposefully intended not to) and want to ignore any potential overlaps.
3) Delete – Remove overlapping rows from the data.
This would be used to trim your cube to prevent growth past what you’re interested in. If you’ve got 10yrs of data in your warehouse, you may only want to load a rolling 3yrs of data in your cube. You could do that rolling at the month level, so once a month you’d delete the last month from 3yrs ago. Defining a report that returns that data and running in Delete mode would remove those rows from the Cube to give you a manageable size.
4) Update Only – Only overwrite overlapping data from the report.
This would probably only be used for patching data, say from an ETL error or source system fix. If data was retroactively changed, you could update just that portion without having to reprocess the entire cube.
The editor for an Incremental Refresh Report is simply the Report Filter portion of a traditional report. You’re defaulted to the existing filter that the target cube is using, but you can modify it in any way you wish. I’d imagine you’d want to keep the existing filters in place, but then add some new ones.
Example, if my original cube was All Regions, All Years, All Months, then I may define an Incremental Refresh Report to just add Last Month as the filter. I can then schedule it to run on the 1st of the month, and it will simply run a database query for 1 month of data and append it to my existing Cube with historical data.
Once your report is complete, you simply right click on it and choose Schedule Delivery To -> Refresh Cube just like before. Only this time, the smaller refresh report is running and updating your cube in a fraction of the time.
In my environment, I’m currently processing about 2hrs worth of Intelligent Cubes every day. With the deployment of 9.2, the daily update window for these cubes is expected to fall below 5mins. That gives us incredible flexibility in loading the cubes for two reasons.
1) Currently we load data into the warehouse in near realtime, but 1am – 5am are daily maintenance jobs on the server. While ETL continues to run in these windows, traffic on the server is high and everything is slowed down. Even though the data for the previous day’s consumption (and all accompanying aggregates and table statistic refreshes) are available for my cubes by 3am, it wouldn’t be feasible to run all of those huge database dumping queries during that traffic, so we wait. This also gives us a buffer in case of an ETL issue so that there is extra time for long running jobs or re-running of jobs. &n
bsp;We can’t use a technique such as having the ETL trigger a Command Manager script to load the cubes since the trigger would be from the 3am hour. There are too many other parallel processes running to determine a good combination of factors for triggering the cubes other than the blind 6:30am starting schedule.
However, with Incremental Refresh, the queries are small enough that they can brave that heavy morning traffic. This allows us to make the cube data available 3hrs earlier than before.
2) Now that we can run the cube refreshes at any time we’d like, we can now setup a Command Manager script trigger event to run from the related ETL processes that finish at around 3am. This allows us to not only have the data available as soon as possible, but in the event of an ETL issue, we don’t have to worry about impending schedules. This gives the ETL team a great more deal of time in the morning as well as flexibility in their own schedule handling (not to mention the pleasant side effect of not having to wake me up to stop the schedules ;)).
Your articles are very good. Please keep doing.
Do you have a sample script for triggering event after ETL completion. I am specifically interested in triggerring from Unix server (where ETL runs) to Windows Iserver.
Check back for Monday’s post on this very topic!
My cube has Week age in it.So every week age will change.Because of this i have to do full refresh of the cube. Is there any way i can do update only the week age every week?
Thanks for your awesome knowledge sharing articles.You really rock……
@Params, I’m not 100% sure if it updates the entire row or just 1 cell in the event that a change occurs, but since every row in your cube would need to be updated, you’ll have to run the full query against the database regardless. You won’t be able to update it with a partial query, so either way it’s going to be faster to just do a full refresh.
Hi Bryan, let me know the difference between windows and unix environments if MSTR.
Hi Bryan.. Thanks for the helpful articles..
I have come across an issue while publishing the Incremental refresh using the script –
PUBLISH INTELLIGENT CUBE “” IN FOLDER “” FOR PROJECT “”;
The CmdMgr throws an error as the object doesn’t exist while the cube (full refresh) gets executed.
I think it’s because an Incremental Refresh Report is considered a Report and not a Cube. Try just assigning the report to a schedule and triggering the schedule like a normal subscription. Note that there’s a bug in 9.2.1 where an Incremental Refresh report has to be assigned to your History List schedule and NOT to “Refresh Cache”, or it’ll fail.
Thanks Bryan.That helped. Another question related to this is that, I have to publish two Incremental cubes sequentially one after the other. Is there a possible way without using the MSTR Scheduler or SDK (as those two options have some complication in my project).
Also, Project Design Guide 8 mentioned about Trigger that is generated when we try to delete an object which searches for Dependent objects. Is it possible to create such internal Triggers to check the cube status?
Sadly, there isn’t a way to cascade refreshes like you’re describing. You currently have to setup an external process to monitor the progress of the first cube and then trigger the second cube with Command Manager when it’s done. A common technique is to use a status table in your database and use Post-SQL in the first cube to update that status. Another job would monitor for a status change and then that would be the indicator that the second cube is ready to start. It’s messy for sure, but it’s the only option right now.
The trigger you mentioned is just internal behavior and not anything that can be modified or added to.
Thanks Bryan. Will try that.
While Using the Incremental Refresh –
I have found that the SQL for Insert, Update and Update Only dint differ much in my cube’s query.
Will this be the case every where? If so, how will the cube differentiate and henceforth perform any of these functions?
The query just returns the relevant data. The difference will be what the cube DOES with the data, which is determined by comparing the query results with the current values in the cube to find what action it should take.
I have a question. Can we run a report against the cube while it is refreshing?
Yes you can, as long as the cube has completed previously. While it’s loading, the original cube is still accessible, and once it’s finished, it’ll do a quick swap in memory.
Do you know what the overhead is on cube processing? Meaning that once I am processing a cube, I can go to the DB and see the job running against it, but once the DB job disappears (finishes), my Cube screen still shows “running report” for a good while after the DB job has returned the results. I don’t know what it is trying to do with the data once it receives the results back from the DB. Does that make sense?
There is some additional processing and transforming of the data in memory, as well as the creation of some indexes on it. There are probably some low level logging traces you can enable to get some insight into those processes, but I don’t think there’s anything you can do to tweak them, so it wouldn’t be more than informational. If you’re concerned about some really long running ones, you should talk to tech support about it and they may give you some tips on different ways to structure the result set to make it easier for MicroStrategy to consume it.
Hi Bryan, when I am trying to use Intelligent cube in one of my Mobile dashboard, my App is crashing. Can you please help me with this?
Sounds like a job for Tech Support to me. Not much I can do for crashes.
Can we use 2 cubes in 1 dashboard?
Yes, but you have to create View Reports on top of them first. Currently, you can only directly link 1 cube to a document.
I am currently trying to implement some incremental refresh reports. Our full cube build looks back 2 years and we want the refresh reports look back 5 days. After doing some testing it seems we are kind of stuck on a limitation though. Our cube (as I would assume most do) hits several fact tables and contains outer joins with the metrics. The problem is:
For the refresh report (last 5 days) a metric may return null if there isn’t an entry in the fact table for the last 5 days. In reality the value for this metric should not be null, it’s just that there is no value in the last 5 days. If this metric was alone in the refresh report, it would not return a row and nothing would be updated.
The problem arises when another metric changes in the same row. Then this null is returned, and when the cube is updated it brings in this null (incorrect value).
I tried to do my best to explain the problem, not sure if it was clear. Wanted to know if you’ve encountered this issue? The refresh report is working as expected but it returning null values seems to be a serious limitation.
Why would the value not be null on a specific date if you ran the report for a longer date range? Are you using some kind of Ending on Hand metrics? If that’s the case, surely there’s a business rule in place where running it for 30 days or something makes sense?
Exactly the type of metric I am talking about. It is a good suggestion, I’ll see if there is some standard for when these values are updated/changed.
It also seems that level metrics we have are changed by the filter. A metric supposed to calculate at the year level cannot seem to do so properly when we are filtering less than a years worth of data.
These are very new to me and still trying to understand them. I do appreciate your response and all the posts as well, they are very insightful.
How does incremental refresh reflect on a cube when update sql statement is run against the database? I see it retain the row level old data and insert new updated data? Is that expected?
If we have report caching enabled at Project Level, and run an I cube report from where the data will be retrieved. From I cube or Report cache?
Cube report is always fetched from it’s Cube.
Hi Bryan ,
can the cube be updated whenever there is a update or insert in the DB..without any time limits
Not automatically. You’ll have to have your ETL process trigger the cube directly using command manager.
Hi Bryan – Appreciate our efforts !!
From the above Incremental Refresh – Is there a way to “Delete – Insert” in a single Incremental Refresh Report ? It Means i have a cube refreshed on a monthly basis which contains data for the past 12 months at any point.
I do appreciate your efforts! Delete+Insert = Update.
Thank you Bryam. I was wondering what algorithm is used by MicroStrategy to implement Incremental refreshing I-Cube feature.
Hi Bryan ,
is their an automated way wherein we can monitor the cube status (when it becomes dirty). without manually logging in the cache of cube in desktop.
Can we run a report against the cube while it is being refreshed incrementally (with Update option)?
As long as it has been updated at least once before, yes.
How do you publish the initial cube without having to load all the data but use the scheduled refresh option and load chunks of data. As i see it the cube must exist, be published first.
Can anyone tell me (or link to TN) on the logic behind an incremental refresh update for a cube? I’m having an issue where the update is actually inserting a new record and keeping the old record when a specific set of attributes are changed in the database. So, I just need to understand how an update works.
I am using MSTR 9.3.0
If you’re using the update option for incremental refresh then for whichever attribute data is getting updated, will get add into the report as new row and will keep the old record as it is. If you want to just add the updated records, I think you can try with insert option.
How this will help you out.
Can you suggest me a good book which has lots of examples on microstrategy tools
This is the only book on the subject that I’m aware of, but I have not read it: http://www.amazon.com/Business-Intelligence-MicroStrategy-Cookbook-Moraschi/dp/1782179755
I am quite new to Microstrategy, specially cubes. Now my organization has couple of cubes and the server is running 120GB of RAM. Still when the cubes are refreshed, and users are running their report, the system hangs, and every time, we have to restart. Could you please help me do some diagnostics as to where should I look for optimizations? In my view view there is some serious problem or setting we are missing as 120GB is huge amount of RAM.
I would suggest to see how much data you are loading in cube, Also how you have set up incremental refresh, see bryan’s explanation above and set to it the minimum amount of data required. Also you should look into what other tools are installed on server having 120 Gigs should much more than sufficient for microstrategy alone.
I hope this would help.
I´ve met with many difficulties working with ICubes, and the most of them caused by Incremental Refresh. You have to be very cautious in how you design the cube and its incremental refresh reports, because the Update option won’t delete the already missing rows you have. For example you don’t want to put in that kind of cubes any “Status” attribute, because at first you could have Customer 1, Status A, $100. Then the incremental refresh that fetches Customer 1, Status B, $30 will insert that as a new row, leaving the older one also.
That´s why it is always good to first delete the portion you are going to update, but I don’t know how to make all those executions synchronical, so the moment the delete finishes the update begins. Any ideas?
Is there any way to correct the history records in the cube for any attribute value or metric value changes?
I’m having the same problem Nacho mentioned. When underlying attribute values change, the “update” option just creates additional rows in the intelligent cube. See http://community.microstrategy.com/t5/Architect/TN45837-How-does-the-Incremental-Refresh-option-work-in/ta-p/195508
It looks like the best solution available to me is to define a limited range of data that I want to update, say the last 2 weeks. Then:
1) Use a Delete operation first for the date range I want to update.
2) Insert the data for that same date range, after the Delete has completed.
It would be nice if the tool simply combined the two operations into a “Replace” operation, and I’m confused as to why this option is missing. It seems like nearly the most common use case for incrementally refreshing your data.
To make the above steps happen in sequence, I think you have to write command manager scripts, although there are probably other ways using Narrowcast or SDK.