Sometimes the speed of business can’t wait for the perfect model and information flows to feed it. Or perhaps it’s just not practical to ship data around to and from multiple locations due to it’s size or timing. Maybe you just need to do a proof of concept or to borrow a few bits from an operational system. MicroStrategy’s Multisource feature aims to assist in all of these scenarios by allowing a single MicroStrategy report to access data from completely separate servers, even if they’re different database platforms entirely. Of course, I’ll discuss a few tricks you can leverage with this technology over your traditional schemas to gain dramatic performance or reporting flexibility as well.
How to Setup Multisource
In order to have Secondary Database Sources available, you must first edit the Project Configuration and go to the Database Source section, and check the box to make them available. You can now edit your schema from within the Warehouse Catalog or Architect Tool. From Architect, simply drag tables from alternate database sources into your project. The tables will display as different colors to designate the source they belong to. In Warehouse Catalog, drag the tables into your project as normal and they’ll be tagged as Primary / Secondary. Primary tables are those that belong to the default DSN of the project, while secondary are those that belong to any other DSNs.
How Multisource Works
MicroStrategy leverages the databases to move the data as opposed to handling large joins in memory with virtual data sets. With a similar algorithm to how it uses Logical Table Sizes to determine which table is best to source a query, MicroStrategy determines which database in a Multisource query is best suited to be the primary source (not necessarily your Primary DSN). Data from the alternate databases are moved over into temp tables to the primary database via inserts. Once the data is virtually moved to a single host, the report processes like normal by joining the warehouse and temp tables to perform the query.
For some good tips on optimizing this process for certain platforms, check out this tech note.
The above data juggling only happens when it needs to, which is only when elements from different datasources are present on the report. If your report contains only objects from a single source, then the report will just execute completely natively on that source like normal. Data isn’t superfluously moved.
Logically, a Multisource implementation should just be a physical extension of your existing model. In addition to having different tables from each DSN, you can also have tables from multiple DSNs simultaneously. It’s a good practice to keep your common dimensions in each place for performance, because this will be less data that has to be moved around. In Warehouse Catalog and Architect, you can designate that a table has multiple sources and it’s icon/color will change. When the SQL Engine goes to resolve the query, it will take this into consideration and save the trouble of moving dimensions around.
For a much more in depth look at additional considerations, check out this tech note.
Prompts are queries too, and although Prompt Caching goes a long way to help speed them up, element searches are not cached. In some cases, a heavy database load can adversely affect prompt performance and cause an overall sluggish feel. One way to alleviate this issue is to load all of your dimensions on a separate, lighter database. The dimensions should exist in both the warehouse and this dedicate prompt database so no data ever needs to actually be ported around. However, individual lookups will now source from this faster, smaller database instead of from the main warehouse. This architecture gives you faster lookup response and an overall zippier feel for your end users.
Aggregates and Details
One of my favorite things to use Multisource for is to take advantage of multiple platform’s strengths. Say for example that you have a lot of very detailed data stored on a Hadoop cluster. While you wouldn’t want to run primary reports from this platform, traditional warehouses may struggle to match Hadoop’s scale. Keeping aggregate data on a workhorse data warehouse appliance and details on Hadoop lets each platform excel in what they were designed for. Reports will execute from the primary warehouse at aggregate levels, but should a user require details, the queries will execute on the separate detailed level cluster. This approach is not only better for performance and data availability, but also much more cost effective by not overloading expensive appliances with detailed data.
In some cases, the operational system will contain real time data that MicroStrategy could potentially access. With a little planning, you could use Metadata Partitioning to send current day queries to the operational system for real time reporting and historical queries to the warehouse. Multisource can also merge the two together onto a single report for an absolute complete picture. This approach (assuming the Operational System can handle the traffic and load), is potentially easier to setup and maintain than developing a real time ETL solution. Multisource Partitioning requires MicroStrategy 9.2+
I’m not exactly sure why, but if you want to access SalesForce.com data, you need to own a Multisource license. It simply adds the option to your Data Source Type list and allows you to report directly off of SalesForce.com data without having to do any additional ETL.
For more information on setting up SalesForce.com connectivity, check out this tech note.
Proof of Concepts
One of the ways I like to use Multisource is for doing proof of concept projects for new subject areas. This allows me to simply incorporate their data with existing warehouse data before ETL. It allows the business users to experience the data first hand and flush out any additional needs, requirements or issues prior to full scale development. This approach also supports Departmental BI which is a process of quickly allowing individual groups to build and connect their data without full life cycle traditional warehousing. This gets quick wins and provides a clear upgrade path to a traditional Enterprise approach down the road.