How Does Business Intelligence Improve the Sales Process?
Source: Keep It Simple [link]
LucidEra’s CEO Ken Rudin was asked this question by AllBusiness.com’s Sales Advisor Keith Rosen. Check out the 2 minute video here.
CIO or CTO
Source: Frank Buytendijk Blog [link]
I recently had a very interesting discussion about the role of IT management. It is the current best practice to define the role of the CIO as a change agent within the organisation. He/she needs to help the organisation to use information, particularly performance information, to gain competitive advantage by leveraging the information asset to improve decison making at all levels in the organisation, from the strategic level right down to all operational levels of the organisation. This is partly about technology, but much more about organisational change — about governance and about enabling new business models.
Traditionally, a CTO’s focus is much more narrow, and focused exclusively on technology. How do I optimise the technology we own? How do I drive cost out of our technology platform? What hardware and networks do we have? Who should be our strategic suppliers of technology? How do I improve the reliability of our technology?
However, I am not sure if I think this is the right way forward.
In my opinion CIOs would do good to become more of a CTO for the coming years. Here’s why:
- Consumerization of IT. The colleagues in the business may know more about IT than you and have very sophisticated requirements. If they talk about making mash-ups and you think it involves gravy and potatoes, you will have a credibility problem. If the IT dept and the CIO are not IT savvy, the business will not only go outside and buy solutions, but they will create their own solutions. I would want my CIO not the best business person, but the best IT guy, just as I would want my CMO to be a marketing expert.
- IT is increasingly about value chain integration. Connecting processes, systems, people across suppliers, customers, employees, everyone. Much of the IT budget will be spent outside the organization; more than 50% will be no exception. This requires a strong infrastructure focus for the CIO, to manage the common foundation. For the future, applications will increasingly leave the IT realm; 2.0 techniques plus service-oriented architectures create business process frameworks, in which power users largely implement and maintain applications themselves.
- Most new business models are IT-driven anyway. Allowing consumers to configure their own orders (mass customization) and track the progress requires advanced web-based applications. Smarter analysis of customer data is achieved through advanced CRM analytics. Many new services are about supplying additional information to consumers; in other words, business intelligence becomes a product and value proposition of its own.
NT-SIZE: 12pt; FONT-FAMILY: Times; mso-fareast-font-family: ‘MS Mincho’; mso-fareast-language: JA; mso-ansi-language: EN-US; mso-bidi-language: AR-SA; mso-bidi-font-family: ‘Times New Roman’”>In all these trends the CIO has the largest business impact, being a techie!
ONT-SIZE: 12pt; FONT-FAMILY: Times; mso-fareast-font-family: ‘MS Mincho’; mso-fareast-language: JA; mso-ansi-language: EN-US; mso-bidi-language: AR-SA; mso-bidi-font-family: ‘Times New Roman’”>–frank
Metrics your mother warned you about
Source: datadoodle [link]
I could easily find a parking place in Berkeley on Saturday! What could it mean? On a normal afternoon in Berkeley’s Gourmet Ghetto, I usually find just one space open, and often I have to drive around the block once…. Ah, it’s the end of Easter week and a bunch of the university people have gone away. So much for bigger meanings like recession, breaking news on TV or sunspots.
Sometimes such offbeat indicators do mean something. How would we know if the U.S. economy has begun sliding into a severe recession? By a surge in the number of eBay items for sale, according to one article in the March 23 New York Times. That would indicate, I assume, that people had become unusually motivated to liquidate whatever belongings they could.
These metrics measure hard-to-reach places, internal or external. They may be eccentric, creative and even strange.
That’s what I’m writing about next for TDWI’s BI This Week.
So far, my sources include these people:
- Stacey Barr, a woman in Australia who comes recommended by performance-dashboard expert Stephen Few.
- The ever-inventive Zach Gemignani of Juice Analytics, whose recent blog post on “Franken-measures” helped inspire the idea.
Oracle is dragging down other canaries
Source: Data Doghouse - performance management, business intelligence, and data warehousing [link]
Oracle (ORCL ) released its fiscal third-quarter results last Wednesday matching analyst overall expectations but spooking people with lighter than expected new application software licenses. The next day its stock was down about 8%.
With a jittery market, people are looking for any sign of a recession or a recovery depending on whether you are a bear or a bull. Analysts started downgrading other software stocks or decreasing price targets after the Oracle earnings announcement. Companies suffering guilt by association include: SAP (
That’s a wide spectrum of software firms but the general feeling (bear case) is that a recession is going to reduce IT spending. There are two key considerations, besides your normal due diligence, when deciding to buy, hold or sell on a software stock.
First, how much of its software sales are generated from the financial services sector. Financial firms are generally significant IT systems buyer but are very much impacted by this economic climate. One may argue that any merger & acquisitions (M&A) activity creates IT sales opportunities but that will not make up for projects delayed or cancelled. Many of these firms will be struggling with new regulations, liquidity issues and adjusting to changing or disappearing revenue streams rather than implementing major software projects.
Second, software firms that are dependent on big-ticket projects for their sales may be impacted by IT/business groups delaying these purchases or even outright cancelling them for this year. Do not fool yourself into thinking that cool new software is too important to delay. Most everything can be delayed if business conditions warrant.
On the flip side, downturns may be the best time for companies to invest for the future. You cannot cut yourself into growing your business. Slashing costs may help your bottom line in the short-term but it rarely ever helps you increase sales or improve the customer experience. Some companies may follow this advice but many will just be stalling or slashing costs. It’s the deer in the middle of the road at night syndrome – too frightened with the oncoming car to just jump out of the way.
Mark Veverka in his Barron’s “Plugged In” column this weekend titled “Oracle: No Profit of Doom” states “A disappointing top line is never a good thing for a software company, but Oracle’s results weren’t a huge shock. They were more a function of the economy than anything amiss at the Redwood Shores, Calif., outfit. But investors didn’t see it that way… I’d suggest that investors pay more attention to Oracle’s earnings expansion going forward, rather than obsess about revenue growth in a sluggish economy.”
The shares of software companies are already down 13.1% YTD as reflected in the iShares S&P GSTI Software Index (IGV). The downturn may continue until people see a light at the end of the tunnel. Historically, the end of the summer has been an opportune time to buy software firms selling to corporate customers.
Ya se puede descargar Pentaho 1.7.0 M1
Source: Todo BI: Business Intelligence, Data Warehouse, CRM y mucho mas... [link]
La nueva versión de Pentaho (1.7) que acaba de ser presentada y lista para descargar, viene con algunas interesantes nuevas funcionalidades (aunque se encuentra en Beta y tiene algunos bugs conocidos).
Descargar version 1.7.0
Release Notes
Estas son sus principales novedades:
- Upgrade a Kettle engine to 3.0.1
- Upgrade a Mondrian engine3.0
- Upgrade Classic Reporting engine a 0.8.9.3
- Permite añadir “extra” roles o nombres de usuario arbitrarios para ser añadidos a las busquedas de LDAP.
- Upgrade ACEGI a 1.0.5
- Resolución de Muchos bugs
Tambien acaba de salir la nueva version de Kettle 3.02.
On-Demand BI Curious?
Source: Keep It Simple [link]
Software-as-a-service (SaaS) business intelligence got a lot of attention last week over at CIO.com. If the concept of business analytics being delivered as an on-demand service is still foreign to you, or if you’re simply on-demand BI curious, be sure to check out this article, Business Intelligence and On-Demand: The Perfect Marriage? Thomas Wailgum reviews […]
How To Show the Last Update Date of the SSAS Cube
Source: Miky Schreiber's Blog - BI [link]
We got many client requests for the ability to show in their web sites the “last updated” date of the data.
It doesn’t matter how you show the data of the SSAS - the customers will always want
to know for which date the data is true.
My solution includes a ASP.NET 2.0 web site that uses the AMO class libary. It takes
the date from the server and shows it to the user.
What you need to do is:
1. Open a new ASP.NET web site using Visual Studio 2005/8.
2. Add the AMO dll (Microsoft.Analysis Services). You’ll find it in the SSAS server.
3. In the already-made default.aspx page, just add one Label.
4. Add a configuration file which will hold the name of the SSAS server. That way,
when you install the site from the development environment to the production environment,
you’ll only have to change this file. Call this file config.xml and write in it the
following:
<?xml version=”1.0″ encoding=”utf-8″ ?>
<ServerName>YourServerFullNameHere</ServerName>
5. In the code-behind file (default.aspx.cs) write the following code instead of what
you already have there:
using System;
using System.Data;
using System.Configuration;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;
using System.Web.UI.WebControls.WebParts;
using System.Web.UI.HtmlControls;
using AMO = Microsoft.AnalysisServices;
using System.Xml;
public partial class _Default : System.Web.UI.Page
{
protected void Load_Page(Object sender, EventArgs e)
{
Label1.Text = GetCubeUpdateDate(Request.QueryString[”DBName”],Request.QueryString[”CubeId”]);
}
private string GetCubeUpdateDate (string dbName, string cubeId)
{
using (AMO.Server asServer = new AMO.Server())
{
asServer.Connect(”Data Source=” + GetAnalysisServerName());
AMO.Database db = asServer.DataBases.FindByName(dbName);
if (db == null)
{
return “DB Name not found”;
}
AMO.Cube cube = GetCubeById(cubeId, db);
if (cube == null)
{
return “Cube Name not found”;
}
DateTime lastProcessed = cube.LastProcessed;
return lastProcessed.Day.ToString() + “/” + lastProcessed.Month.ToString()
+ “/” + lastProcessed.Year.ToString();
}
}
private string GetAnalysisServerName ()
{
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(Request.PhysicalApplicationPath + “config.xml”);
return xmlDoc.GetElementsByTagName(”ServerName”).Item(0).InnetText;
}
private AMO.Cube GetCubeById (string cubeId, AMO.Database db)
{
foreach (AMO.Cube cube in db.Cubes)
{
if (cube.ID.Equals(cubeId))
{
return cube;
}
}
return null;
}
}
Eventhough the code is self-explained, here are some points referring it:
-
I chose not to include the server name in the web.config file because I like to seperate
application-related configuration and web configuration. -
If you want you can get the cube name from the user (in the query string) and then
the code is even shorter - just get the cube like I got the database. -
I wanted to show the date in the format DD/MM/YYYY, so that’s why I did the long return
statement in the GetCubeUpdateDate method. If you want to return the date in the MM/DD/YYYY
format you can use the lastProcessed.GetShortDateFormat() method. -
Note that when you publish the web site you need to create a dedicated virtual folder
in the IIS. -
The user uses this site in the following way: All he need to do is to create a frame
with this site’s address as its source and add it the DBName & CubeId in the query
string. In SharePoint it’s even easier - the uses only need to create a page shower
web part.
enjoy.
IDC ANUNCIA LAS 10 TENDENCIAS QUE MARCARAN EL MERCADO TI DE CHILE EN 2008
Source: Business Intelligence Argentina [link]
Pentaho Commercial Open Source Business Intelligence
Source: Business Intelligence Argentina [link]
Data Mashups, Web 2.0 - Aula365® recibió el premio Manuel Sadosky 2007 a la innovación tecnológica
Source: Business Intelligence Argentina [link]
IBM - SOA News - Soa + Web 2.0 | QEDWiki: Identificando a Web 2.0 en SOA
Source: Business Intelligence Argentina [link]
Take advantage of FE caching to optimize MDX performance
Source: Microsoft OLAP by Mosha Pasumansky [link]
We usually treat caching system of Analysis Services as a black box, trusting that it will do the “right thing” to optimize the execution. And this is how things should be in the ideal world, caching system should be completely transparent to the end user. However, even though MDX query optimizer and caching system are very sophisticated, they are not perfect. In this article we will see how with very simple MDX rewrites, we can take better advantage of caching, and increase performance of MDX calculations significantly.
(In order to follow examples in this article, you will need to use MDX Studio tool)
As an example we will use very simple statistical analysis of the data, where we would apply “Three sigma rule” to find outliers across combination of attributes. It is based on the fact that for data which conforms normal distribution almost all of the values (99.7% of them) will lay within 3 standard deviations of the mean. So the values which are farther from the mean by more than 3 standard deviations can be considered outliers. (There are of course more advanced methods to find outliers in the data, but for the purpose of this article this is representative enough).
We will start with the analysis of Internet Sales by day inside Adventure Works cube. The MDX to find out “farther than three sigma” days seems straightforward
with
member sales_avg as Avg ([Date].[Date].[Date], [Internet Sales Amount]), format_string = 'currency'
member sales_stdev as StDev([Date].[Date].[Date], [Internet Sales Amount]), format_string = 'currency'
select { [Internet Sales Amount] } on 0
, filter([Date].[Date].[Date], abs([Internet Sales Amount]-sales_avg) > 3*sales_stdev) on 1
from [Adventure Works]
If we execute this query we will get the list of 10 outlier dates. But the query takes 7.5 seconds to execute. This is horrible ! 7.5 seconds to go through 1158 members seem completely unreasonable. To get a better idea about what’s going on, let’s take a look at some execution statistics collected by MDX Studio:
Time : 7 sec 417 ms Calc covers : 1164 Cells calculated : 1344448 Sonar subcubes : 3 SE queries : 1159 Cache hits : 1159 Cache misses : 2 Cache inserts : 2 Cache lookups : 1161
Now we can understand what was going on here. For each one of these 1158 members, the engine recalculated both average and standard deviation, even though they were exactly the same at each point (it is easy to verify through debugger in MDX Studio). The problem is that the engine didn’t figure out itself that average and standard deviation were the same for all dates, so we need to help it a little bit. We can redefine sales_avg as a calculation which goes up to All member in Date attribute. This way regardless at which date we are looking right now, the cell coordinate will shift to the All member, and will be computed only once, because all the other times it is referenced it can be answered from FE cache. Calculation rewritten with this trick in mind will look the following:
with
member sales_avg_ as Avg ([Date].[Date].[Date], [Internet Sales Amount]), format_string = 'currency'
member sales_stdev_ as StDev([Date].[Date].[Date], [Internet Sales Amount]), format_string = 'currency'
member sales_avg as ([Date].[Date].[All Periods],sales_avg_), format_string = 'currency'
member sales_stdev as ([Date].[Date].[All Periods],sales_stdev_), format_string = 'currency'
select { [Internet Sales Amount] } on 0
, filter([Date].[Date].[Date], abs([Internet Sales Amount]-sales_avg) > 3*sales_stdev) on 1
from [Adventure Works]
If we execute it now, it is done in mere 0.1 fraction of second. The MDX Studio stats look like following
Time : 109 ms Calc covers : 9 Cells calculated : 4642 Sonar subcubes : 3 SE queries : 2 Cache hits : 4 Cache misses : 2 Cache inserts : 2 Cache lookups : 6
This is much better. Instead of SE query per cell, we got only 2 SE queries (one for axis and one for avg/stdev calculation). There are also significantly less cells calculated, since both average and standard deviation are calculated only once now and served from the cache.
Let’s make the problem a little bit complex now. We computed statistics across all 4 years, and discovered that all the outlier dates were in 2004. This is probably attributed to the growth over time. So let’s compute statistics differently now, instead of computing them for all dates in 4 years, let’s compute them within each month, and find outliers within the month. The straightforward approach to this can be coded in MDX as following.
with
member sales_avg as Avg ([Date].[Calendar].Parent.children, [Internet Sales Amount]), format_string = 'currency'
member sales_stdev as StDev([Date].[Calendar].Parent.children, [Internet Sales Amount]), format_string = 'currency'
select { [Internet Sales Amount], sales_avg, sales_stdev } on 0
, filter([Date].[Date].[Date], abs([Internet Sales Amount]-sales_avg) > 3*sales_stdev) on 1
from [Adventure Works]
Executing this query now finds two outliers:
| Internet Sales Amount | sales_avg | sales_stdev | |
| July 22, 2001 | $38,241.29 | $15,270.59 | $6,747.86 |
| December 8, 2001 | $55,454.04 | $24,371.87 | $10,146.40 |
And the MDX Studio stats will be
Time : 704 ms Calc covers : 1206 Cells calculated : 38792 Sonar subcubes : 41 SE queries : 1198 Cache hits : 1198 Cache misses : 2 Cache inserts : 2 Cache lookups : 1200
Again, the stats don’t look good. Too many SE queries, too many cells calculated. How can we apply the same technique as before ? Moving coordinate to the All dates will be incorrect now, because average and standard deviations are now different for every month. Therefore we need to move coordinates to the month level. The most natural way of doing it is simple move call to the Parent function from within the Avg/Stdev calls into the tuple coordinate shift transformation. Here is how the resulting query will look like:
with
member sales_avg_ as Avg ([Date].[Calendar].children, [Internet Sales Amount]), format_string = 'currency'
member sales_stdev_ as StDev([Date].[Calendar].children, [Internet Sales Amount]), format_string = 'currency'
member sales_avg as ([Date].[Calendar].Parent,sales_avg_), format_string = 'currency'
member sales_stdev as ([Date].[Calendar].Parent,sales_stdev_), format_string = 'currency'
select { [Internet Sales Amount], sales_avg, sales_stdev } on 0
, filter([Date].[Date].[Date], abs([Internet Sales Amount]-sales_avg) > 3*sales_stdev) on 1
from [Adventure Works]
The stats from this rewrite will be
Time : 203 ms Calc covers : 162 Cells calculated : 6954 Sonar subcubes : 117 SE queries : 78 Cache hits : 78 Cache misses : 2 Cache inserts : 2 Cache lookups : 80
Which is much more reasonable result.
MDX and partitioning
Source: Microsoft OLAP by Mosha Pasumansky [link]
Partitions have been a built-in feature of Analysis Services ever since the first version. Naturally, all the engine subsystems, including the MDX query optimizer, work well with partitions. There are rare cases, however, when over-aggressive prefetching can generate query plan which is not optimal with respect to the partitioning scheme. Today we will review one such scenario and see how MDX could be rewritten to suit the partitioning scheme better.
Let’s consider scenario where we need to determine the last date for which there is data in certain measure group. The example will be built around Adventure Works sample database, using Date dimension and Internet Sales Amount as a measure.
One more thing to note: while writing this post I used MDX Studio which simplifies many operations, so when below I say “look at the set”, it means inspecting it in the Watch window of MDX Studio, when I say “clear the cache”, it means using MDX Studio’s cache clear button etc. The only other tool that needs to be run is Profiler, since AS traces are not yet integrated into MDX Studio, but the MDX Studio development team is working on this feature. Also, in order to be able to run MDX Script statements from within MDX Studio, I added “Cube=Adventure Works” in the Connection Properties field of the Connect dialog.
Due to limitations of sqlblog.com, in all examples below remove the space before the square braket in the [Date ] fragments
The most straightforward way to compute the last date is by using the following formula:
CREATE SET CurrentCube.LastDate as Tail(NonEmpty([Date ].[Date ].[Date ], [Measures].[Internet Sales Amount]), 1)
We can see that the result set will have a single member in it - July 31, 2004. But if we execute this statement over the clean cache, we can see the following events in the trace:
Query Begin 0 - MDXQuery MDX Studio v0.2.6.0 Progress Report Begin 14 - Query Started reading data from the ‘Internet_Sales_2001′ partition. Progress Report Begin 14 - Query Started reading data from the ‘Internet_Sales_2002′ partition. Progress Report Begin 14 - Query Started reading data from the ‘Internet_Sales_2003′ partition. Progress Report End 14 - Query Finished reading data from the ‘Internet_Sales_2001′ partition. Progress Report Begin 14 - Query Started reading data from the ‘Internet_Sales_2004′ partition. Progress Report End 14 - Query Finished reading data from the ‘Internet_Sales_2002′ partition. Progress Report End 14 - Query Finished reading data from the ‘Internet_Sales_2003′ partition. Progress Report End 14 - Query Finished reading data from the ‘Internet_Sales_2004′ partition. Query End 0 - MDXQuery
We see that all partitions got queried, even though the last non empty date is in 2004, and therefore it should’ve been enough to query just the Internet_Sales_2004 partition. Unfortunately, the query optimizer gets carried away in this example, decides to ignore the outer Tail(…, 1) and executes NonEmpty over all dates literally over all dates. So, we need a different strategy here. The second obvious strategy for finding last non empty date is the recursive approach - start from the last date in the dimension and move back by one date until we run into a date which has data. The formula for that is
CREATE
MEMBER CurrentCube.LastDateIndex AS
Iif(
IsEmpty([Measures].[Internet Sales Amount]),
[Date ].[Calendar].PrevMember,
Rank([Date ].[Calendar].CurrentMember, [Date ].[Calendar].[Date ])
)
SET LastDate AS [Date ].[Calendar].[Date ].Item((LastDateIndex, Tail([Date ].[Calendar].[Date ],1).Item(0))-1)
Here we implemented the recursive logic inside calculated member LastDateIndex, and then we position it on the last date (using Tail(1) over level of dates for that). The result is the same as before, but trace looks very different:
Query Begin 0 - MDXQuery MDX Studio v0.2.6.0 Query Subcube 2 - Non-cache data August 31, 2004 Query Subcube 2 - Non-cache data August 30, 2004 Query Subcube 2 - Non-cache data August 29, 2004 Query Subcube 2 - Non-cache data August 28, 2004 Query Subcube 2 - Non-cache data August 27, 2004 Query Subcube 2 - Non-cache data August 26, 2004 Query Subcube 2 - Non-cache data August 25, 2004 Query Subcube 2 - Non-cache data August 24, 2004 Query Subcube 2 - Non-cache data August 23, 2004 Query Subcube 2 - Non-cache data August 22, 2004 Query Subcube 2 - Non-cache data August 21, 2004 Query Subcube 2 - Non-cache data August 20, 2004 Query Subcube 2 - Non-cache data August 19, 2004 Query Subcube 2 - Non-cache data August 18, 2004 Query Subcube 2 - Non-cache data August 17, 2004 Query Subcube 2 - Non-cache data August 16, 2004 Query Subcube 2 - Non-cache data August 15, 2004 Query Subcube 2 - Non-cache data August 14, 2004 Query Subcube 2 - Non-cache data August 13, 2004 Query Subcube 2 - Non-cache data August 12, 2004 Query Subcube 2 - Non-cache data August 11, 2004 Query Subcube 2 - Non-cache data August 10, 2004 Query Subcube 2 - Non-cache data August 9, 2004 Query Subcube 2 - Non-cache data August 8, 2004 Query Subcube 2 - Non-cache data August 7, 2004 Query Subcube 2 - Non-cache data August 6, 2004 Query Subcube 2 - Non-cache data August 5, 2004 Query Subcube 2 - Non-cache data August 4, 2004 Query Subcube 2 - Non-cache data August 3, 2004 Query Subcube 2 - Non-cache data August 2, 2004 Query Subcube 2 - Non-cache data August 1, 2004 Progress Report Begin 14 - Query Started reading data from the 'Internet_Sales_2004' partition. Progress Report End 14 - Query Finished reading data from the 'Internet_Sales_2004' partition. Query Subcube 2 - Non-cache data July 31, 2004 Query End 0 - MDXQuery
Now we see lots of non-cached Query Subcube requests for the dates going from August 31, 2004 down to July 31, 2004. Interestingly, only the last one triggers reading of the partition, since for the others, partition autoslice automatically detects that they don’t belong to any partition. So in certain sense this is a good result, since we only touched one partition, but in another sense, this query plan with lots of Query Subcube requests doesn’t look good. We were lucky, that in Adventure Works there were only 31 days at the end of Date dimension without data, so recursion went 31 step only. But it is not uncommon to have Date dimensions looking several years ahead, in which case recursion could go on for thousands of steps, and overhead of thousands of Query Subcube requests could become substantial, not to mention the overhead of the depth of recursion.
This leads us to another approach. Instead of iterating by days, we can iterate by partitions. Since in Adventure Works partitioning scheme is by years, we will iterate by years, and then find the last non empty date within a year. The MDX that implements this approach is below:
CREATE MEMBER CurrentCube.LastYearIndex AS Iif( IsEmpty([Measures].[Internet Sales Amount]), [Date ].[Calendar Year].PrevMember, Rank([Date ].[Calendar Year].CurrentMember, [Date ].[Calendar Year].[Calendar Year]) ) SET LastYear AS [Date ].[Calendar Year].[Calendar Year].Item((LastYearIndex, [Date ].[Calendar Year].LastChild)-1) SET LastDate AS Tail(NonEmpty([Date ].[Date ].[Date ].MEMBERS, (LastYear, [Measures].[Internet Sales Amount])), 1)
Again, this gives us the desired result, and now the trace looks like following:
Query Begin 0 - MDXQuery MDX Studio v0.2.6.0 Progress Report Begin 14 - Query Started reading data from the 'Internet_Sales_2004' partition. Progress Report End 14 - Query Finished reading data from the 'Internet_Sales_2004' partition. Query Subcube 2 - Non-cache data 00000000,000,00000,00,000000000000100000,000000000000000000,000000000000000000,0000000000000000000000,000000000000000000000,00,10 Progress Report Begin 14 - Query Started reading data from the 'Internet_Sales_2004' partition. Progress Report End 14 - Query Finished reading data from the 'Internet_Sales_2004' partition. Query Subcube 2 - Non-cache data 00000000,000,00000,00,010000000000100000,000000000000000000,000000000000000000,0000000000000000000000,000000000000000000000,00,10 Query End 0 - MDXQuery
We eliminated lots of Query Subcube requests, but we still have two, and now each one triggers query of partition. Why there are two of them ? This is easily answered by inspecting the Query Subcube granularity bitmasks. We see that the difference between them is in the Date dimension. First Query Subcube doesn’t have granularity on Date attribute, while the second one has. It is clear now, that the first Query Subcube request is triggered by IsEmpty([Measures].[Internet Sales Amount]) check inside Iif - because it happens at the Year granularity, and the second Query Subcube is due to NonEmpty over dates with the filter on the specific year.
We don’t like the fact that we go to the partition twice, essentially for the same data but on different granularity. So we can do a little trick in MDX. Since we will have to go to the Date granularity within a year anyway, let’s do it sooner rather than later - and expand granularity already inside Iif check. The resulting MDX will look like following:
CREATE MEMBER CurrentCube.LastYearIndex AS Iif( IsEmpty(Aggregate(Descendants([Date ].[Calendar], [Date ].[Calendar].[Date ]), [Measures].[Internet Sales Amount])), [Date ].[Calendar Year].PrevMember, Rank([Date ].[Calendar Year].CurrentMember, [Date ].[Calendar Year].[Calendar Year]) ) SET LastYear AS [Date ].[Calendar Year].[Calendar Year].Item((LastYearIndex, [Date ].[Calendar Year].LastChild)-1) SET LastDate AS Tail(NonEmpty([Date ].[Date ].[Date ].MEMBERS, (LastYear, [Measures].[Internet Sales Amount])), 1)
Let’s see what we get in the trace now:
Query Begin 0 - MDXQuery MDX Studio v0.2.6.0 Progress Report Begin 14 - Query Started reading data from the 'Internet_Sales_2004' partition. Progress Report End 14 - Query Finished reading data from the 'Internet_Sales_2004' partition. Query Subcube 2 - Non-cache data 00000000,000,00000,00,011010000001100000,000000000000000000,000000000000000000,0000000000000000000000,000000000000000000000,00,10 Query Subcube 1 - Cache data 00000000,000,00000,00,010000000000100000,000000000000000000,000000000000000000,0000000000000000000000,000000000000000000000,00,10 Query End 0 - MDXQuery
This is perfect - exactly what we wanted. The second Query Subcube which used to be Non-cache data, now turned into cached one, and there is only one query to only one partition going on.
This post was written as a result of discussion on sql.ru forum
Optimizing Count(Filter(…)) expressions in MDX
Source: Microsoft OLAP by Mosha Pasumansky [link]
As the readers of my blog know, bulk evaluation mode (called “block computation mode” in Katmai) delivers much better performance in MDX than the cell-by-cell evaluation mode. Therefore the most important optimization technique with MDX in Analysis Services is to rewrite MDX in such a way that makes block computations possible. Easy to say, but not always easy to do. With the release of Katmai’s CTP5, Microsoft published the BOL article outlining conditions when block computations are and are not possible. Chris Webb picked on this article and mentioned in his blog, that “the list of set functions is a bit limited (where is Filter?)“. In reply I said that I didn’t think that Filter function was that common inside the MDX calculations, but Greg Galloway immediatelly came up with a good example, one that involves Count(Filter(…)). This is indeed a common calculation - every time we want to know how many entities are there which satisfy certain condition. (Greg’s example was to find out how many physicians performed 10 or more cases during certain time period.)
Let’s build an example using Adventure Works sample cube and see how we can optimize it. In Adventure Works terms, our task would be to find the number of products which had more than 5 orders placed over the Internet. Such calculation can be written as
Count(Filter([Product].[Product].[Product], [Measures].[Internet Order Quantity] > 5))
Indeed, neither in AS2005 nor in AS2008 the Filter function is optimized to work in the block computation mode, therefore the query involving this calculation will execute in the cell by cell mode:
Note: Here and below, change [Date ] to remove trailing space after Date in order to run the query. I had to insert this space due to a bug with the blog hosting software (Community Server)
WITH MEMBER [Measures].[High Volume Products Count] AS Count(Filter([Product].[Product].[Product], [Measures].[Internet Order Quantity] > 5))SELECT [Customer].[Customer Geography].[Country] ON 0, [Date ].[Calendar].[Date ].MEMBERS ON 1FROM [Adventure Works]WHERE [Measures].[High Volume Products Count]
If we execute this query in MDX Studio, we will get the following stats:
Time : 30 sec 781 msCalc covers : 4Cells calculated : 4217436Sonar subcubes : 2SE queries : 1Cache hits : 1Cache misses : 1Cache inserts : 1Cache lookups : 2Memory Usage KB : 4160
The best hint here that this query indeed executed in the cell-by-cell mode (beyong the slow execution time) is value of “Cells calculated” perfmon counter. Now, the way most people approach optimization for such MDX is trying to reduce the number of cells to iterate. One way to do it is to eliminate manually all the empty cells from Filter. I.e., if value of [Internet Order Quantity] is NULL for certain product, it is definitely less than 5. With this in mind, one possible rewrite for the query would be
WITH MEMBER [Measures].[High Volume Products Count] AS Count(Filter( Exists([Product].[Product].[Product],,"Internet Sales") ,[Measures].[Internet Order Quantity] > 5)) SELECT [Customer].[Customer Geography].[Country] ON 0 , [Date ].[Calendar].[Date ].MEMBERS ON 1 FROM [Adventure Works] WHERE [Measures].[High Volume Products Count]
Here Exists inside Filter eliminates the empty space. Let’s run this again in MDX Studio and look at the new stats
Time : 12 sec 46 ms Calc covers : 954 Cells calculated : 50036 Sonar subcubes : 7900 SE queries : 7899 Cache hits : 7899 Cache misses : 1 Cache inserts : 1 Cache lookups : 7900 Memory Usage KB : 43936
The time indeed decreased, from 30 seconds to 12 seconds. This is good, but something doesn’t add up. The number of cells went down from 4 million to only 50 thousands, i.e. about 85 times, yet the execution time decreased by only 2 times. The explanation is that putting Exists (or NonEmpty) inside MDX calculation is usually a bad idea - since now for every cell, there is an SE query being sent (it can be seen through “Query Subcube” event in trace too). In our case there were 7900 such queries sent. Because the set inside Exists was always the same and the context was fixed, only 1 such query actually had to go to disk, and other 7899 hit the cache, yet the overhead of issuing SE query is non-trivial, even when it is answered from the cache.
Conclusion is that even though we get some performance gain from this approach, it doesn’t play well with the rest of the system, and only drives us further from the goal of switching to the superior block computation mode. We need to rewrite the calculation in such a way that we eliminate Filter. Fortunately, it is possible to do. Let’s recall that Count function returns number of tuples in the set, and Filter return set of tuples which satisfy certain condition. I.e. we are counting how many tuples satisfy a condition. If we convert count to sum, and sum up 1’s every time when the condition is met and 0’s every time when the condition is not met, we will get the same result. I.e.
Count(Filter(set, condition)) = Sum(set, Iif(condition, 1, 0))
Using this formula we can rewrite our calculation as following:
WITH MEMBER [Measures].[High Volume Products Count] AS Sum( [Product].[Product].[Product], Iif([Measures].[Internet Order Quantity] > 5,1,0)) SELECT [Customer].[Customer Geography].[Country] ON 0 , [Date ].[Calendar].[Date ].MEMBERS ON 1 FROM [Adventure Works] WHERE [Measures].[High Volume Products Count]
Executing this query yields the following stats:
Time : 6 sec 375 ms Calc covers : 4 Cells calculated : 6948 Sonar subcubes : 1 SE queries : 1 Cache hits : 1 Cache misses : 1 Cache inserts : 1 Cache lookups : 2 Memory Usage KB : 0
This is definitely much better. The time is down to 6 seconds, and we can tell that within every cell the Sum works very efficiently, because we only see 6948 cell calculated (this is the exact number of cell in the resulting cellset). Yet, we are not in true block computation mode yet. We are still running Sum for every cell, instead of computing the entire query in one operation. What prevents us from doing it now ? Now the problem is with Iif function inside the Sum. I have written about Iif function and its interaction with block mode in the past. If we reread that article, we will see that we are in the scenario where condition inside Iif looks at the cell values and not at the attribute coordinates, and the only thing we can do here is to have one of the branches to return NULL. And it probably makes sense to define our calculated member to return NULL instead of 0 when there are no products which satisfy our condition. After this rewrite we get
WITH MEMBER [Measures].[High Volume Products Count] AS Sum( [Product].[Product].[Product], Iif([Measures].[Internet Order Quantity] > 5,1,NULL)) SELECT [Customer].[Customer Geography].[Country] ON 0 , [Date ].[Calendar].[Date ].MEMBERS ON 1 FROM [Adventure Works] WHERE [Measures].[High Volume Products Count]
This gets us down to 4 seconds, which is yet another improvement, but still doesn’t yet reach the best execution plan. Now it is time to use the performance optimization hints. Looking at the expression Iif([Measures].[Internet Order Quantity] > 5,1,NULL) we observe that it is guaranteed to be NULL when [Measures].[Internet Order Quantity] is NULL. Therefore we can define the MDX script flavor of NON_EMPTY_BEHAVIOR for it. Since NEB cannot be defined on subexpressions, we will separate it into special calculated measure. Let’s write the following fragment inside MDX Script:
CREATE HIDDEN Summator; [Measures].[Summator] = Iif([Measures].[Internet Order Quantity] > 5,1,NULL); NON_EMPTY_BEHAVIOR([Measures].[Summator]) = [Measures].[Internet Order Quantity];
And then use it inside our calculation:
WITH MEMBER [Measures].[High Volume Products Count] AS Sum([Product].[Product].[Product], [Measures].[Summator]) SELECT [Customer].[Customer Geography].[Country] ON 0 , [Date ].[Calendar].[Date ].MEMBERS ON 1 FROM [Adventure Works] WHERE [Measures].[High Volume Products Count]
Finally, when we execute this query, it finishes in about 0.2 second (218 milliseconds). This is the performance we expect to observe from the block computation mode. From 32 seconds to 0.2 seconds - improvement of over 160 times !
Now the even better news are, that starting with Katmai November CTP5 (which was released couple of days ago, so I finally can publicly speak about it), the trick with explicitly defining NON_EMPTY_BEHAVIOR is not required, the engine recognizes it itself. Therefore, even our previous query (the one which took 4 seconds in AS2005) returns in 0.2 seconds in AS2008. (This improvement in query optimizer in Katmai is just a tiny little tip of the iceberg around improvements in block computation query plans, and I intend to cover this subject in more depth in the upcoming blogs, if I will have enough time for that).
How to detect subselect inside MDX calculations (aka Multiselect in Excel 2007)
Source: Microsoft OLAP by Mosha Pasumansky [link]
The subject of multiselect friendly calculations is a popular one in forums. I have written about it before here, here and here. In February 2008 I will write a blog which will give a definite answer on how AS2008 will deal with it (which is follow up on this post). But in the meantime, questions about AS2005 keep coming. By now, most people realize that it is possible to detect presence of set in WHERE clause by using EXISTING operator, which takes care of multiselect queries generated by Excel 2003 and other tools. However, Excel 2007 for multiselect uses subselects which are stealthier, their presence cannot be detected neither by EXISTING nor by any other MDX function. The standard answer for detecting subselects always was to use query scoped named sets, because they are subject of implicit autoexist with subselects. However, this approach isn’t applicable to Excel 2007, since it is up to Excel 2007 to generate MDX queries, and the user can only control the expression for the calculation inside MDX Script.
So the question that gets asked is “Is it possible in AS2005 to detect and obtain subselect restriction from the MDX calculation?”. Up until now the answer to this question was “No”. I have thought about it, and I found a way how to make the answer to this question “Yes”. I must warn, that the solution is neither elegant nor efficient, and probably is mostly of the theoretical interest. I wouldn’t recommend using it in the real implementation. Still, it is interesting one in my opinion.
Let’s start with the assumption, that the lowest attribute on which multiselect is possible has only few members. More precisely it should have 63 or less members. We will see how we can lift this limitation later, but for now let’s go with it. For our example, we will choose [Ship Date].[Calendar Year] attribute, which only has 4 members. For the calculation itself, let’s pick example of calculating the maximum yearly sales. The classic formula for this is below:
Max(Descendants([Ship Date].[Calendar].CurrentMember, [Ship Date].[Calendar].[Calendar Year]), [Measures].[Internet Sales Amount])
Indeed, if we run the following query
WITH
MEMBER Measures.[Max Yearly Sales] AS Max(Descendants([Ship Date].[Calendar].CurrentMember, [Ship Date].[Calendar].[Calendar Year]), [Measures].[Internet Sales Amount])
SELECT
{Measures.[Max Yearly Sales]} ON 0
FROM (
SELECT { [Ship Date].[Calendar].[Calendar Year].&[2001], [Ship Date].[Calendar].[Calendar Year].&[2003] } ON 0
FROM [Adventure Works])
When we run this query, the result is $10,158,562.38. Quick check query:
SELECT {[Measures].[Internet Sales Amount]} ON 0
, [Ship Date].[Calendar].[Calendar Year] ON 1
FROM [Adventure Works]
it shows that the number we got was actually for year 2004. But it wasn’t even included by our subselect restriction which covered 2001 and 2003 only. So how does the solution works. First, we create another small table (or named query in DSV) which looks like following:
| Year | Encoding |
| 2001 | 1 |
| 2002 | 2 |
| 2003 | 4 |
| 2004 | 8 |
In general case we want to list every member in the lowest attribute by which the multiselect is expected, and assign values which are powers of 2. The idea here is that each member then gets assigned a bit in the 64-bit representation of the integer. Adding Encoding values for different members will be equivalent for OR’ing their representative bits.
Next we build a small measure group off this table, by linking Year column to the [Calendar Year] attribute in the [Ship Date] dimension, and Encoding is used as a measure with Aggregation Funciton Sum. But due to a special nature of Encoding values, the Aggregation Function here really behaves as OR.
This is the root of the idea. When there is a subselect restriction, all physical measures get automatic Visual Totals applied to them if the coordinate [Calendar Year] hasn’t been overwritten by formula. Applying Visual Totals to the Encoding measure will mean that it will be equal to a number, which has bits set to 1 only for those members which participated in subselect ! Now all we need to do is to decode back from the bitmask to the MDX set. It may seem that the easiest way to do it is to run Filter over [Calendar Year] and do filtering by AND’ing bitmasks. However, this won’t work, since using Filter over [Calendar Year] will cause [Calendar Year] attribute to be overwritten, and Visual Totals won’t get applied - the trick won’t work. Therefore, we need to do something a little more sophisticated - write a special purpose stored procedure which will behave exactly like Filter, only without coordinate overwrite. Such stored procedure is simple to write, below is the full source code for it:
public Set DecodeSet(Set set, System.UInt64 Encoding)
{
SetBuilder sb = new SetBuilder();
System.UInt64 bit = 1;
foreach (Tuple t in set.Tuples)
{
if ((bit & Encoding) != 0)
sb.Add(t);
bit *= 2;
}
return sb.ToSet();
}
Now, the expression for the calculated member will look like
Max(ASSP.ASStoredProcs.Util.DecodeSet([Ship Date].[Calendar].[Calendar Year], [Measures].[Encoding]), [Measures].[Internet Sales Amount])
And we can verify that it works correctly by running the following query:
WITH
MEMBER Measures.[Max Yearly Sales] AS Max(ASSP.ASStoredProcs.Util.DecodeSet([Ship Date].[Calendar].[Calendar Year], [Measures].[Encoding]), [Measures].[Internet Sales Amount])
SELECT
{Measures.[Max Yearly Sales]} ON 0
FROM (
SELECT { [Ship Date].[Calendar].[Calendar Year].&[2001], [Ship Date].[Calendar].[Calendar Year].&[2003] } ON 0
FROM [Adventure Works])
This is nice, now back to the case when we have more than 63 members for the attribute. For example, we want to go over days, not over years, and in Adventure Works we have 4 years worth of data, which translates to 1158 days. 1158 > 63. Well, the only solution in this case is to break the attribute into groups of 63 members and assign dedicated measure to each group. In our case this makes 19 new measures. It is not pretty, but at least it can be automated. The sproc will have to change to accept set as a second parameter, where the formula will pass MeasureGroupMeasures(”ShipDateMultiselect”), and sproc will have to make nested loops in order to move to the right member for the right measure. The exact code is left as exercise to the reader, if the reader is still interested.
MDX Studio forum
Source: Microsoft OLAP by Mosha Pasumansky [link]
Two dot releases of MDX Studio generated unusually high level of feedback both in email and in the comments in both mirrors of my blog (at sqljunkies and at sqlblog). In order to keep these and future discussions more structured and not to pollute my blog with lots of noise about every little feature and bug in MDX Studio, I would like to move these discussion to the proper forum. Vidas Matelis was kind enough to offer his site to host this forum. From now on, please post suggestions, bug reports and all other comments about MDX Studio in the MDX Studio forum at http://www.ssas-info.com/forum/MdxStudio. You don’t have to register in order to post (although you may if you want to). I will monitor this forum closely, and will post all the future updates there. I will still announce major version releases of MDX Studio in this blog for those who don’t want to suffer through the beta testing and want to get the finished and polished product.
MDX Studio v0.1.1 update
Source: Microsoft OLAP by Mosha Pasumansky [link]
Yesterday announcement of MDX Studio has generated lots of feedback, and I would like to thank everybody for taking your time to try this early version. Below I address most of the comments I received:
1. The error that got reported most is “Category not found” during execution of MDX query. I believe this error is due to inability to query perfmon counters. I will appreciate if people who run into this problem will check whether they get AS perfmon counters in the perfmon application. Especially the ones under “MSAS 2005:MDX”, “MSAS 2005:Cache” and “MSAS 2005:Memory” categories.
In the meantime I have implemented a workaround, if some of the counters are not found, the query will still execute. The updated version is v0.1.1 and can be downloaded from http://cid-74f04d1ea28ece4e.skydrive.live.com/browse.aspx/MDXStudio/v0.1.1
This should unblock MDX query execution piece. However, MDX query execution is the least interesting aspect of MDX Studio. There are dozens of tools which allow MDX query execution, and SSMS does this just fine. The real power of MDX Studio is under the Parse button and inside MDX expression debugger. Both these features are not affected by the perfmon issue, and I am very interested to get feedback on them.
2. Skydrive is not available in all countries, so MDX Studio cannot be downloaded. If your country is excluded from Skydrive beta, you can download MDX Studio from www.mosha.com/msolap/MDXStudio/v0.1.1/MDXStudio.exe
3. Color coding doesn’t work. How did I get pretty colors in the screenshots ?
Yes, color coding is not implemented yet. I wrote the queries in SSMS and copy pasted them into MDX Studio. Since MDX Studio still lacks many features (like functional drag and drop, color coding, open/save), I find it easier to write queries in SSMS and paste them back.
4. Drag and drop from metadata tree drops object caption instead of object unique name.
Thanks - this will be fixed in the future release.
5. Perfmon tab to show which aggregates were hit.
Cannot get this info through perfmon, but can get it from elsewhere. This will probably take a while until it is implemented.
6. Problem on Vista with error: Access to the registry key ‘Global’ is denied.
It worked on at least one another Vista installation, so I am not sure what the problem here. MDX Studio doesn’t try to access registry directly
7. Not everything works with AS2000.
I need to do more testing with AS2000.
Thanks again.
Announcing MDX Studio CTP1 (v0.1 Alpha)
Source: Microsoft OLAP by Mosha Pasumansky [link]
Redmond, WA September 18, 2007 - Mosha Pasumansky today announced the immediate availability of MDX Studio Community Technology Preview (CTP) 1, version 0.1 Alpha.
MDX Studio is tool which helps users of Microsoft Analysis Services to analyze complex MDX expressions, monitor performance characteristics of MDX queries and get insight into how MDX interacts with other UDM features such as attribute relationships. MDX Studio provides unique visualization representation of MDX expression and allows the user to interact with different stages of MDX execution. MDX Studio is aimed to be compatible with all the versions of Microsoft OLAP servers - OLAP Services 7.0, Analysis Services 2000, SQL Server Analysis Services 2005 and SQL Server Analysis Services 2008 (codename Katmai).
The licensing terms were not disclosed, however Mosha has explicitly noted, that everybody who was registered for the ‘Deep Dive to MDX’ preconference presentation at PASS 2007 summit will have lifetime eligibility for free copy for all future versions of MDX Studio.
The MDX Studio CTP1 v0.1 Alpha can be downloaded from http://cid-74f04d1ea28ece4e.skydrive.live.com/browse.aspx/MDXStudio/v0.1 (if this SkyDrive thing actually works). Mosha Pasumansky welcomes all the feedback, but especially critique and bug reports. The current 0.1 Alpha version is pretty much work in progress, with many features not yet implemented (for example query execution is synchronous and Cancel doesn’t work, drag and drop is almost non-existing etc).
The brief explanation of how this thing works follows:
MDX Studio is very similar to SSMS, but with deeper focus on MDX. It includes connection string builder which allows user to specify different connection string properties categorized in groups - Security, MDX Semantics, Performance Hints etc.
Below is a screenshot of query execution in MDX Studio. Note, that on the left hand side, some of the useful Perfmon counters are automatically computed as deltas for just this query. Counters can be configured through Tools -> Options -> Perfmon menu.
Perhaps most important feature of MDX Studio is behind the “Parse” button. In SSMS, it simply validates that the MDX query is syntactically correct. MDX Studio doesn’t support yet parsing of MDX SELECT queries or other MDX commands, but it supports parsing of MDX expressions. Below is sample screenshot of the parse tree for the expression of simple calculated member which computes Reseller Sales Amount for all products containing string “Front” in their names.

But what is more interesting, the user can now interact with this tree and see how each subexpression evaluates. Let’s position the cursor on CurrentMember node under Descendants. We see in the Results pane, that this current member evaluates to Components member. Note the ‘Current Context’ window on the right hand side, where user can change the coordinates of the current context, thus affecting results of the MDX expression evaluation.
Next, let’s move cursor on Descendants, and see what it evaluates to. The Results pane now displays the set which is result of applying Descendants to the Components member.
Let’s make things even more interesting. We are running Filter over this Descendants set which checks boolean expression. Let’s see what it evaluates to. MDX Studio understands that the boolean condition needs to be evaluated in the context of iteration over Descendants set, therefore the result looks like below:
Let’s dig in deeper. Why some of the comparisons are true and other false. Now we can look inside boolean expression, and position cursor over VBA!Instr function. Again, MDX Studio is intelligent enough to show how Instr evaluates over the set inside Filter:
We could go on and on with this. Once we computed Filter, we now can see how Sum was calculated:
Hopefully this gives an idea what MDX Expression debugger is capable of doing.
Lastly, there is a handy “Coordinates” button, which allows the user to see effect of applying strong hierarchies. Here is a screenshot which shows how August 2003 decodes other attributes:
Of course, it is possible to use any MDX expression here as well. So if we will add FirstChild, we will get to the August 1st 2003:
There are plenty of plans for new features (like support for MDX Scripts) and finishing existing features, but at this point I believe there is enough functionality implemented to start sharing it with the community and gathering the feedback. Please let me know what do you think, what do you like and again, especially what you don’t like so I can fix it.
PASS Presentation canceled :(
Source: Microsoft OLAP by Mosha Pasumansky [link]
I am very upset about it, but my tomorrow’s PASS preconference presentation isn’t going to happen. My flight to Denver today was canceled, after having spent few hours in Seattle airport I realized that there was absolutely no way for me to get to Denver before tomorrow morning. I went through all airlines, asking for any flight, any connection, and every other flight was full and sold out. I apologize to everybody who registered for this presentation. I hope PASS will be able to accommodate you for another session. I am very upset, for I have spent lots of time during last month preparing for this presentation. I believe that I gathered plenty of unique material not available anywhere else, knowledge that would have benefited MDX practitioners. I even developed a new tool (code name ‘MDX Studio’), especially for this presentation, which features MDX expression debugger, MDX query execution performance monitor and attribute overwrite viewer - this was supposed to be a surprise announcement during my presentation. If only PASS was in Seattle, just like last year, and year before that.
Anyway, I apologize again, and once I calmed down, I will start thinking about other ways to share the content of the presentation. It will take me months to write all of it down, so I hope another opportunity for speaking will come soon again, this time in Seattle or somewhere close, like Vancouver or Portland - where I could just drive…
Outline of the ‘Deep dive to MDX’ PASS presentation
Source: Microsoft OLAP by Mosha Pasumansky [link]
My PASS presentation is quickly approaching. Currently there are about twice as many people registered for it than I thought there will be. I am getting a little nervous about it. After all, 7 hours is a lot of time, and I am not sure what people expect from this session. Since there is still time to make changes, I decided to publish the tentative outline of the presentation here, in the hope to get more feedback. The outline is not final, the order of subjects will change for sure, but this should give a good idea what to expect from the session. If this isn’t what you had in mind - there is still time to cancel and go for another session. (For example, I think about half of the presentation is based on former blog entries (although somewhat expanded), so if you read all my blogs, then you already know at least half the content).
Introduction
Architecture
- SE and FE
Subcube
- Granularity
- Single grain/mixed grain
- Slice
- DataIDs
- OR slice
- Slice below granularity
- Arbitrary shape
SE
- Architecture
- Sonar
- Cache Ratio
SE Measure group Cache
- Cache registry
- Aggregate & Filter from cache
- Prefetching
- Cache sharing
- Slice below granularity
- Visual totals
- Subselects
- Arbitrary shapes
- MDX stiching
SE query plan
- Aggregations
- Scan vs. indexes
- Bitmap indexes
- Segment compression
- Arbitrary shapes
- Normalize mixed grain with Descendants
- Slice
- Autoslice
- OR slice
SE dimensions
- Natural vs. unnatural hierarchies
- Dimension cache
FE
- Architecture
FE Cache
- Global scope
- Security
- Dynamic
- Non deterministic functions
- Visual totals/subselects
- Hidden members
MDX Caching techniques
- Value caching
- Subexpressions
- KPI/cell security/calc properties
- Today
- Tuple caching
- Dynamic
- Static/dynamic/alias
FE execution
- Sonar
- Calc covers
- Calculate
FE execution plans
- Cell by cell
- Bulk evaluation (aka block computation mode)
- Space transformations
- Space partitioning
Forcing block mode
- Optimized MDX functions
- Static literal references
- Multiplication
- ValidMeasure vs. IgnoreUnrelatedDimensions
- Writing over NULLs
- IIF vs. IF vs. SCOPE
- Cache Ratio
- Hidden members
Aggregate function
- Implicit aggregations
- Calcs on leaves
- No cascading
- No NonEmpty
Scenarios
- Ratios
- Comparison to previous period
- Running sum
- Inventory
- Moving average
Non Empty
- Sonar
- SE data
- Calculated members
- Calculated measures
- NON_EMPTY_BEHAVIOR
Exists and Autoexists
- Autoexist
- cell
- Position in crossjoin
MDX
Sequence
- Bootstrapping
- MDX Script
- Static vs. Dynamic statements
- Calculation properties
Sets
- Iterator pipelining
- Containers
- Rankers
- Enumerated set and Materialization
- Crossjoin
- cannot operate on a set with more than 4,294,967,296 tuples
Strong hierarchies
- Attribute relationships
- Diamond shape
- Decoding attributes
- Cube space
- Non-idempotence of CurrentMember
MDX in UDM
- Existing vs. Descendants
- Strong hierarchies vs. User hierarchies
Calculation precedence
- History
- SOLVE_ORDER
- Pass
- Implicit aggregations
- MDX Script order
- CREATE MEMBER vs. CREATE
Calculated members
- Non-aggregatable
- Multiselect on assignment
- Excel 2007
WHERE vs. subselects
- What WHERE does
- What subselect does
- Implicit Exists
- Coordinate overwrites
- Overwrite history
Multiselect
- WHERE
- Existing
Semiadditive measures
- LastNonEmpty
- RunningSum + LastChild
- Manual with member caching
Security
- Dimension security
- Orthogonal design
Dimension query
- How to query dimension
- Dimension cube
- Enumeration
- NonEmpty
- Extract
- Non SE materializable
Stored procedures
- Best practices
- ASSP project
