Using Views In Your Warehouse Model
One trick that I love to use is for my DBA to always create an associated View in the database for each Table in the warehouse. I reference these Views just the same via Warehouse Catalog/Architect, and I make it a point to never hit the Tables directly. There are some that will have a fit and say that I’m going to suffer greatly in performance even for simple “select *” views. While that may be true in some environments, my own personal testing has never reached those results. to be sure, hitting the Views doesn’t have any performance benefits, but the real gain is flexibility for both me and my DBA. Today I’ll discuss this strategy and if you’re not using Views today, you could be missing out on some handy solutions to your problems.
The hardest part of Business Intelligence is ETL. I like to say that ETL is “95% of the work”, but they don’t get any of the glory. While we sleep soundly at night as our users are asleep, those wee hours of the night are usually the most hectic for ETL Administrators. The majority of projects run the majority of their ETL in off hours, and any kind of unforeseen source system changes will surely creep up at those times. Considering as MicroStrategy Developers, we’re the face of the project and reap all of the glory, I always try my best to give credit where it’s due.
I also try to help out as much as I can, and one of those ways is by striking a balance between where to solve a problem. MicroStrategy tries to provide as many options as possible, so when a new requirement arises, do we build a Custom Group? A Report as Filter? An ApplySimple attribute? Or do we go back and request that ETL save us again and do the work for us so that we can have our simple “select * from FactTable” reports. Surely, that balance includes performance considerations. As easy as it is for me to create a Custom Group and save everyone a lot of work, if the report takes 5 minutes to run vs 5 seconds if they add it to the table during ETL, it’s just not a sacrifice we can make.
This is where using Views comes in. One scenario I had recently went like this:
We have a dimension that gets loaded from multiple 3rd party data sources. But not every element comes from an individual source, some elements are contributed to by multiple sources, and some elements don’t have a source at all because they are inherent to the system. This means that Data Provider is a Many-to-Many to Dimension, but that not all Dimensions have a Data Provider. Using the classical relationships, I could end up with duplicate rows or dropped rows. But, I still need this table in this structure to support filtering where a user only wants to see Dimensions from a particular Data Provider.
So how would you solve this? Surely there are some gymnastics you can do with Attribute Join Types and Level Metrics, but I want a more robust systemic solution and not one that has to be laboriously implemented per report. No doubt, this solution is going to come from ETL.
In our case, these two tables aren’t very large. Both tables are in the six-figures in rows, and for our platform that’s a spit in the bucket. So our solution was to edit our existing V_Dimensions view and add a simple left join and SQL to append a single column that lists all of the Data Providers for that Dimension record, comma separated. In MicroStrategy, I created this as a separate attribute, so I now have “Data Providers” and “Data Provider Breakout”, where the first is the single value string concatenation and the later is the Many-to-Many join.
The result was something very easy for ETL to implement. They didn’t have to write any complicated mappings or schedule yet another workflow for the wee hours. I didn’t have to do any kind of crazy MicroStrategy sorcery and I have the answer directly provided to me. The users can prompt on Data Provider and yet also receive the list per row without duplicates. It’s an elegant solution made possible by Views.
Now, if we weren’t already using views, and prior to this requirement, the V_Dimension was just “Select * from Dimension”, this would have been a lot harder to implement. It would have either required ETL, or a separate table to be loaded and now I’ve got yet another Logical Table / Join / Relationship. Compacting that logic into the view isolates me from the complication and makes debugging my reports a lot easier.
It’s a slippery slope, because the more you take this shortcut, the worse performance will get. If querying the Dimension table takes 1 second, and querying the view that now has this join in it takes 2 seconds, I’ve already started to slip in performance. In this case, I think a report that takes 1 second longer is worth avoiding the administrative overhead and development time is worth it, but keep in mind that it means every time that table is accessed, it will be dealing with those extra joins in the views whether the report needs them or not.
Another advantage is that it makes renaming tables/columns much easier. Prior to 9, if you needed to rename a warehouse table or column, it was a nightmare of object juggling (Create new, remap, remove old). In 9 with the Architect tool, you can now rename warehouse tables directly, though columns are still a pain to juggle. Using views makes it easier to handle both scenarios as you can either leave the views alone, or do an additional column select in the view to give you that intermediary column for your object moving. You also get the advantage of being able to try out new schemas without a lot of work, and then have the choice to go back and convert them to physical tables should you decide to proceed with those changes.
Overall, I think views offer a project tremendous flexibility, so don’t be so quick to dismiss them because you are prejudice against their perceived performance impacts. As always, consult your DBA and at least give it a try.