How You Can Win in a Tougher Market?
Source: Keep It Simple [link]

Not sure how I missed this Selling Power editorial from a couple of months ago. Thanks to our friends at CSO Insights for posting the link to it on their homepage.
The editorial by Gerhard Gschwandtner, publisher of Selling Power, provides an overview of the CSO Insights 2008 Sales Optimization Survey. He asks the following questions to VPs of Sales:
- Who generates your leads?
- What happens to the deals in your pipeline?
- Do your salespeople research prospects before each call?
- Do your salespeople sell value and avoid discounting?
- Do you sweep failures under the rug?
- Do you hire reps that succeed?
- What’s your goal?
My favorite part of Gerhard’s overview is the following statement:
“The survey says 43.8 percent of the sales organizations need improvement in these areas. Your CRM solution isn’t going to help. What will help is a best-practice sharing system that harnesses the collective intelligence of your sales organization. The SavoGroup.com can connect all reps with the best practices of your top performers. LucidEra and Business Objects offer better analytics that drive better decisions.”
You can read the full article here and download the full survey here (registration required).
Error in SSIS with Oracle Date Columns
Source: RDA Corp - Business Intelligence and SQL Server [link]
Just Some Excel Helpers and Respective Unit Tests
Source: Loosely Coupled Human Code Factory [link]
Just some semi-elegant stuff I’ve conjured up recently to get column names in Excel and do some other functional things. I wrote this code in a true TDD fashion. It still, for some reason, seems odd to me to write tests in VB, since it was always a language that encouraged dirtiness. But alas, this is fairly solid stuff.Imports NUnit.Framework
<TestFixture()> _
Public Class HelperTests
<Test()> <Category(”Presenter Helper Tests”)> _
Public Sub TestHelperGetExcelColumnLetterSingleLetter()…(read more)
DBLinks the sequel
Source: Oracle Warehouse Builder (OWB) Weblog [link]
Wal-Mart chooses Oracle Business Intelligence
Source: Frank Buytendijk Blog [link]
DATAllegro to move to Windows/SQL Server after Microsoft buyout
Source: Blogging about all things SAS [link]
Carrying on from my earlier post, there is an interesting post by the founder of DATAllegro on his blog here, stating that now Microsoft have purchased DATAllegro they will be migrating their appliance from Ingres/Linux to SQL Server/Windows.
Over on the Microsoft BI Blog there is a post lauding how they have rave reviews for DATAllegro warehouses with data volumes over 200+ TB and amasing query response times.
So what will the move to Widows and SQL Server bring?
What will the migration path be like?
How will customers feel about the requirement to constantly apply windows patches to their “appliance”?
How many 200TB SQL Server/Windowswrehouse sites are out there at the moment?
Given the move from one technology stack to another is always fraught with difficult (just think SAS FM, SPM and ABM) then I would hate to be a DATAllegro customer with a 200TB warehouse at the moment.
Transit :: The Easy Way
Source: Loosely Coupled Human Code Factory [link]
Yesterday I biked it to work as I have been for weeks now. Except today I rode the old trusty Streetcar & #35 into the office. On the way home I hit up the #35 as always, but transferred downtown to the MAX to PGE Park. From there I just walked the rest of the way home, all of a mighty ten minutes. The MAX, was ridiculously packed by the time it reached Pioneer Square, and by the time it reached PGE Park barely anyone else could physically fit in the thing. I’m glad they…(read more)
Using a DBLink Location for ETL
Source: Oracle Warehouse Builder (OWB) Weblog [link]
User Stories
Source: Loosely Coupled Human Code Factory [link]
The mumbo jumbo of use case, requirements gathering, and all that are a necessary evil of software design. We developers cannot unfortunately just haphazardly build our UIs and other such things. It is necessary that in some way the user interface is dictated by the users. In the past, especially with waterfall, this has been a massive failure point for companies having software built. The UI not reflecting actual use or features of what was actually needed (vs. what is asked for,…(read more)
Five questions with Dwight Mouton
Source: The sascom magazine blog [link]
Dwight Mouton is a Product Manager for SAS Americas.
What are you working on right now?
I’m quite busy right now. I’m finalizing requirements for Marketing Optimization version 5.3., settling in after a move from Winston Salem to Cary, and looking for customers who will discuss Marketing Optimization with me.
What are you reading?
I’m currently reading Lance Armstrong’s War by Daniel Coyle and The Sorcerer’s Stone from the Harry Potter series. Last month I read Competing on Analytics and Super Crunches while traveling. Now I’m trying to decide on my next business-related book – I usually alternate one fiction and one non-fiction. I also read the articles on SAS intranet daily.
What industry trends are you watching?
Since I spent 15 years in banking before coming to SAS in 2007, I’m usually following that industry pretty closely. I’m starting to add Telecom, Retail, and Insurance to my list because they are popular with MO.
What do you do to unwind after a long day’s work?
After a tough day, a good hour-long session of Grand Theft Auto gets my mind off things. To really decompress I love to go for a 30 minute run or an hour bike ride. Once I’m decompressed, hanging out with my family is one of my favorite pastimes.
What music do you like to listen to while exercising?
If I’m running I really enjoy the podcast entitled Podrunners which has different programs depending on your workout level. Since I recently started up running again, I’m on a training plan that will get me prepared for a competitive 5K in 8 weeks. When I’m working with weights I prefer something hardcore; Metallica is my favorite by far.
Data quality: definiciones
Source: Information Management [link]
A
Flood Gates on Operational BI on the Verge Of Breaking Wide Open
Source: Blog: Mike Ferguson [link]
It seems that everywhere I look the number of vendors gearing up for Operational BI is just massing as if awaiting an onslaught on the market. Among them include HP, IBM, InforSense, Progress, Tibco, SL, ThinkAnalytics and many more. Key to this is event driven data integration, in-memory data, predictive analytics and rules engines. Vendors like SL have even released a data cache for in-memory analytics with its RTView product. Progress is also pushing with its Aparma product and long term pioneer ThinkAnalytics are also doing well. Clearly the giants are also moving to get all their pieces in play. IBM’s acquisition of German rules vendor iLOG looks like it could also be used in the world of on-demand and automated decisioning. Also SAP Business Objects Labs have released a prototype on event-driven BI and seem to be partnering with SPSS on predictive analytics.
It’s all heating up, yet in the UK it staggers me the number of companies that are not seeing the benefit of these emerging technologies. Some verticals would reap massive benefits by exploiting this technology. If you are interested in Operational BI drop me an email at ukinfo@intelligentbusiness.biz or check out some of my articles on this either on my B-Eye-Network channel or on my web site http://www.intelligentbusiness.biz/ArticlesSignUpPage.html and http://www.intelligentbusiness.biz/ArchiveArticlesFront.html
Aggregate Facts and Dimensions
Source: Tod means Fox [link]
Aggregate fact tables are special fact tables in a data warehouse that contain new metrics derived from one or more aggregate functions (AVERAGE, COUNT, MIN, MAX, etc..) or from other specialized functions that output totals derived from a grouping of the base data. These new metrics, called “aggregate facts” or “summary statistics” are stored and maintained in the data warehouse database in special fact tables at the grain of the aggregation. Likewise, the corresponding dimensions are rolled up and condensed to match the new grain of the fact.
These specialized tables are used as substitutes whenever possible for returning user queries. The reason? Speed. Querying a tidy aggregate table is much faster and uses much less disk I/O than the base, atomic fact table, especially if the dimensions are large as well. If you want to wow your users, start adding aggregates. You can even use this “trick” in your operational systems to serve as a foundation for operational reports. I’ve always done this for any report referred to by my users as a “Summary”. (As an aside, there is a difference between an “aggregate” and a “summary”. I’ll explore these differences in my next post.)
For example, take the “Orders” business process from an online catalog company where you might have customer orders in a fact table called FactOrders with dimensions Customer, Product, and OrderDate. With possibly millions of orders in the transaction fact, it makes sense to start thinking about aggregates.
To further the above example, assume that the business is interested in a report: “Monthly orders by state and product type”. While you could generate this easily enough using the FactOrders fact table, you could likely speed up the data retrieval for the report by at least half (but likely much, much more) using an aggregate.
Here, using the atomic transaction FactOrders table:
SELECT c.STATE, p.product_type, t.YEAR, t.MONTH, SUM(f.order_amount) FROM FactOrders f JOIN DimCustomer c ON c.CustomerID = f.CustomerID JOIN DimProduct p ON p.ProductID = f.ProductID JOIN DimTime t ON t.YEAR = DATEPART(yy, f.YearMonthID) AND t.MONTH = DATEPART(mm, f.DateID) GROUP BY c.STATE, p.product_type, t.YEAR, t.MONTH
The aggregate is querying much less data and queries against time are now much simpler. In my non-scientific tests, the following query ran many times faster (a few seconds compared to about 30 seconds!).
SELECT c.STATE, p.product_type, t.YEAR, t.MONTH, SUM(f.order_amount) FROM FactOrders_Agg1 f JOIN DimCustomerState c ON c.CustomerStateID = f.CustomerStateID JOIN DimProductType p ON p.ProductTypeID = f.ProductTypeID JOIN DimMonth t ON t.YearMonthID = f.YearMonthID GROUP BY c.STATE, p.product_type, t.YEAR, t.MONTH
Creating the Fact and Dimensions
To implement, you will need to roll up your fact table by the hierarchies found in your dimensions. The result will be a new fact table, a set of new accompanying dimensions at the grain of the fact, and all new foreign keys for mapping. I usually name the fact table the same as the base fact with some meaningful suffix appended to the end. In SSMS, this keeps the aggregates with the fact in my object explorer. Dimensions usually get new names (like CustomerState and ProductType) and should be conformed so that they can be reused across business processes. You could even create views instead of new dimensions, but this does not eliminate the need to regenerate new surrogate keys.
When rolling up dimensions, you are provided with an excellent opportunity to perform aggregate functions on the dimension itself and store the results as new attributes. For example, you may want to know how many customers are living in each state. This could be used as the denominator in some population calculation you plan to use against the aggregate fact. Your new dimension might therefore look like the following:
SELECT Cust_Country, Cust_Region, Cust_State, COUNT(Cust_ID) FROM DimCustomer GROUP BY Cust_Country, Cust_Region, Cust_State
The most obvious aggregate function to use is COUNT, but depending on the type of data you have in your dimensions, other functions may prove useful. Just be warned: If you find that aggregate functions are being used a lot on your dimensions, you may need to revisit your design. There may be opportunities to pull out those metrics into existing or new fact tables!
Generating aggregates is largely an incremental process, where you examine query and reporting usage looking for places to improve performance. Aggregates stored in the RDBMS are maintained through ETL and/or your OLAP engine.
A Note About OLAP Aggregates
Theoretically, storing aggregates in a fact table in a RDBMS is the same as storing them in an OLAP cube. In OLAP storage, aggregates are precalculated summaries of data from the different dimensions of the cube, such that a query that seeks to know the aggregate (sum) of some metric (order amount) for X (customer state) and Y (product type) over T (monthly orders) would only need to look inside the cube at those exact coordinates to get the answer. I won’t pretend to know how this stuff is physically stored, but OLAP engines across the board offer better performance, management, and navigation mechanisms for the aggregations than is available through the RDBMS (even when using Indexed or Materialized Views).
Next post, I’ll write some thoughts on the differences between “Summaries” and “Aggregates”!
Many-to-Many Dimension bug
Source: Chris Webb's BI Blog [link]
Jon Axon mailed me recently with an interesting bug he’d come across concerning many-to-many dimensions, where it looks like the AS engine is trying to be a bit too clever for its own good as far as query optimisation goes. It’s reproducible on AS2005 SP2 and Katmai RC0, and I logged it on Connect here: https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=357469
Here’s an example on Adventure Works that Jon gave me. The following query:
SELECT
[Measures].[Customer Count] on 0
FROM [Adventure Works]
Shows the total number of customers as being 18484. Looking at the Sales Reason dimension (which has a many-to-many relationship with the measure group Customer Count is from), if you run a relational query on the underlying data source you can see that not every order was associated with a sales reason - this is the key point here. Now if we run the query:
SELECT
[Measures].[Customer Count] on 0
FROM [Adventure Works]
WHERE [Sales Reason].[Sales Reason].[All Sales Reasons]
We can see that it returns the same result as the first query, 18484, as you would expect. But what if we want to find the number of customers who specified a sales reason? You would think that the following query would do that:
SELECT
[Measures].[Customer Count] on 0
FROM [Adventure Works]
WHERE [Sales Reason].[Sales Reason].[All Sales Reasons].Children
But it doesn’t, it returns 18,484 again. However if you run this query which should be equivalent to the previous query:
SELECT
[Measures].[Customer Count] on 0
FROM [Adventure Works]
WHERE {[Sales Reason].[Sales Reason].&[1],
[Sales Reason].[Sales Reason].&[2],
[Sales Reason].[Sales Reason].&[3],
[Sales Reason].[Sales Reason].&[4],
[Sales Reason].[Sales Reason].&[5],
[Sales Reason].[Sales Reason].&[6],
[Sales Reason].[Sales Reason].&[7],
[Sales Reason].[Sales Reason].&[8],
[Sales Reason].[Sales Reason].&[9],
[Sales Reason].[Sales Reason].&[10]}
It returns the value we’re looking for, 17022. Not good! What I suspect is happening here is that when the query optimiser sees the expression [Sales Reason].[Sales Reason].[All Sales Reasons].Children in the Where clause, it assumes that it’s equivalent to [Sales Reason].[Sales Reason].[All Sales Reasons] which it is in most cases, just not here. Interestingly if you look in Profiler and run the queries on a cold cache, you see exactly the same activity when [Sales Reason].[Sales Reason].[All Sales Reasons].Children is in the Where clause as when [Sales Reason].[Sales Reason].[All Sales Reasons] is there; but when you explicitly list all of the individual sales reasons, as in the last query, because Sales Reason has to resolve itself through the Internet Sales Order Details dimension which is ROLAP, you see SQL queries being fired off against the relational database.

Scripts for loading up DBGEN TPC-H data
Source: oramoss oracle [link]
If you’re interested in creating a TPC-H schema for testing purposes, then the following scripts may be of use to you:
Unix Scripts:
Multi Load TPCH
Load TPCH Stream
SQL*Loader Control files:
REGION Table
NATION Table
SUPPLIER Table
CUSTOMER Table
PART Table
PARTSUPP Table
ORDERS Table
LINEITEM Table
You may wish to read and check them before you use them - they’re not exactly rocket science but they seem to do the job.
I have all the files in the same directory for simplicity sake.
I then use them to create a scale factor 1 target TPC-H schema using the following calls:
# “l” loads the REGION and NATION tables
./multi_load_tpch.sh 1 l “tpch/tpch@test” 1
# “s” loads the SUPPLIER table
./multi_load_tpch.sh 1 s “tpch/tpch@test” 10
# “c” loads the CUSTOMER table
./multi_load_tpch.sh 1 c “tpch/tpch@test” 10
# “p” loads the PART and PARTSUPP tables
./multi_load_tpch.sh 1 p “tpch/tpch@test” 10
# “o” loads the ORDERS and LINEITEM tables
./multi_load_tpch.sh 1 o “tpch/tpch@test” 10
Obviously, you need to change the connection string to match your environment.
Caveats:
- Obviously, they are supplied as is - use at your own discretion and risk.
- You need to have created the target schema tables and made sure they are empty as the SQL*Loader control files use APPEND.
- Bear in mind that choosing too high a number of parallel streams (the last parameter in the calls) will overload your machine so try and balance it against the available system resources.
Bugs, issues or questions, please get in touch…enjoy.
Product Volatility: Optimizing MDX with MDX Studio
Source: Microsoft OLAP by Mosha Pasumansky [link]
In this article we will show how to take typical, everyday MDX and optimize it by simply following advices of MDX Studio Analyzer.
As a starting point, I will take “Product Volatility” example as published by Thomas Ivarson in his blog. It is a perfect example, because Thomas took a real business problem, and implemented it using very clean and straightforward MDX, just like I expect many MDX practitioners do. This MDX can be optimized, but some of these optimizations might not be obvious. Running MDX Studio Analyzer and following its advices takes the guesswork out of equation.
Here is the MDX that computes total number of products, number of products with sales increased from last month and number of products with sales decreased from last month:
WITH MEMBER Measures.TotNumberOfProducts AS Count ( Descendants ( [Product].[Product Categories].CurrentMember ,[Product].[Product Categories].[Product] ) ) MEMBER Measures.NumberOfSoldProductsIncreasing AS Count ( Filter ( NonEmpty ( Descendants ( [Product].[Product Categories].CurrentMember ,[Product].[Product Categories].[Product] ) ,[Measures].[Internet Sales Amount] ) , ( [Date].[Calendar].CurrentMember ,[Measures].[Internet Sales Amount] ) - ( [Date].[Calendar].PrevMember ,[Measures].[Internet Sales Amount] ) > 0 ) ) MEMBER Measures.NumberOfSoldProductsDecreasing AS Count ( Filter ( NonEmpty ( Descendants ( [Product].[Product Categories].CurrentMember ,[Product].[Product Categories].[Product] ) ,[Measures].[Internet Sales Amount] ) , ( [Date].[Calendar].CurrentMember ,[Measures].[Internet Sales Amount] ) - ( [Date].[Calendar].PrevMember ,[Measures].[Internet Sales Amount] ) < 0 ) ) SELECT { [Measures].[TotNumberOfProducts] ,[Measures].[NumberOfSoldProductsIncreasing] ,[Measures].[NumberOfSoldProductsDecreasing] } ON 0 ,[Date].[Calendar].[Month] ON 1 FROM [Adventure Works]
Executing this query with hot cache in MDX Studio produces the following perfmon counter statistics:
Time : 359 ms
Calc covers : 85
Cells calculated : 7672
Sonar subcubes : 79
SE queries : 76
Cache hits : 76
Cache misses : 3
Cache inserts : 3
Cache lookups : 79
The time looks good, but only because both the cube and the query are small. The number of SE queries should raise a red flag here. With only 114 cells in the result, why there were 76 SE queries ? Obviously, on more serious cubes, this will become a major performance issue. So, let’s run MDX Studio Analyzer and see what it has to say about this query. We get back the following list:
| Line | Col | Message | Link |
| 3 | 6 | If you are trying to count number of members in current selection - consider introducing special measure group for this dimension | More Info |
| 7 | 41 | MDX function ‘CurrentMember’ may raise an error or produce non-desired result when user applies multiselect | More Info |
| 12 | 6 | Consider rewriting Count(Filter(set, cond)) construct as Sum(set, summator) with summator IIF(cond, 1, NULL) | More Info |
| 20 | 45 | MDX function ‘CurrentMember’ may raise an error or produce non-desired result when user applies multiselect | More Info |
| 27 | 34 | MDX function ‘CurrentMember’ may raise an error or produce non-desired result when user applies multiselect | More Info |
| 32 | 34 | MDX function ‘PrevMember’ may raise an error or produce non-desired result when user applies multiselect | More Info |
| 39 | 6 | Consider rewriting Count(Filter(set, cond)) construct as Sum(set, summator) with summator IIF(cond, 1, NULL) | More Info |
| 47 | 45 | MDX function ‘CurrentMember’ may raise an error or produce non-desired result when user applies multiselect | More Info |
| 53 | 14 | Same expression was used before at Line 26 Column 14. Consider eliminating common subexpressions for better performance and to take advantage of cache |
Let’s go over this list and fix issue by issue.
Line 3, Col 6: If you are trying to count number of members in current selection - consider introducing special measure group for this dimension
The link points to “Counting days in MDX” blog. While we count here Products and not Days, the same technique applies here. These types of questions are best done not in MDX, but by introducing measure group, which only contains Products dimension, and single measure [Number Of Products] of type Count.
Line 7, Col 41: MDX function ‘CurrentMember’ may raise an error or produce non-desired result when user applies multiselect
Now, this is no longer a problem because of how we solved the previous warning ! Indeed, if we wanted to see total number of products across Bikes and Accessories categories, we would’ve added the following WHERE clause:
WHERE {[Product].[Category].&[1],[Product].[Category].&[4]}
It fails with the original query, but works fine with Count measure [Number of Products]
Lines 12/39, Col 6: Consider rewriting Count(Filter(set, cond)) construct as Sum(set, summator) with summator IIF(cond, 1, NULL)
This is a common construct, and as shown in the link to the “Optimizing Count(Filter(…)) expressions in MDX”, it performs much better if rewritten to use Sum. But before we do this rewrite, let’s take a look at the next warning:
Line 53, Col 14: Same expression was used before at Line 26 Column 14. Consider eliminating common subexpressions for better performance and to take advantage of cache
If we click on this message, the expression in question is the following:
(
[Date].[Calendar].CurrentMember
,[Measures].[Internet Sales Amount]
)
-
(
[Date].[Calendar].PrevMember
,[Measures].[Internet Sales Amount]
)
Indeed, this expression represents Sales Growth, so we can isolate it into separate calculated member. But before we write the exact expression for the calculated member, let’s look deeper into the original expression. It features Count(Filter(NonEmpty(…))) construct, i.e. only products which had sales in current month are considered for increasing/decreasing calculation. I.e. even if product had sales in the previous month, but not in the current month, it won’t be counted in the number of products with decreasing sales. In order to repeat the same logic, the expression would be
[Sales Growth] =
IIF
(
IsEmpty([Measures].[Internet Sales Amount])
,NULL
,
[Measures].[Internet Sales Amount]
-
(
[Date].[Calendar].PrevMember
,[Measures].[Internet Sales Amount]
)
)
For more details why IIF(cond, NULL, exp) is good for performance, read “Performance of IIF function in MDX” blog. Now, putting all of together so far, we will get the following query:
WITH MEMBER [Sales Growth] AS IIF ( IsEmpty([Measures].[Internet Sales Amount]) ,NULL , [Measures].[Internet Sales Amount] - ( [Date].[Calendar].PrevMember ,[Measures].[Internet Sales Amount] ) ) MEMBER PositiveGrowth AS IIF ( [Measures].[Sales Growth] > 0 ,1 ,NULL ) MEMBER Measures.NumberOfSoldProductsIncreasing AS Sum ( Descendants ( [Product].[Product Categories].CurrentMember ,[Product].[Product Categories].[Product] ) ,[Measures].[PositiveGrowth] ) MEMBER NegativeGrowth AS IIF ( [Measures].[Sales Growth] < 0 ,1 ,NULL ) MEMBER Measures.NumberOfSoldProductsDecreasing AS Sum ( Descendants ( [Product].[Product Categories].CurrentMember ,[Product].[Product Categories].[Product] ) ,[Measures].[NegativeGrowth] ) SELECT { [Measures].[Number Of Products] ,[Measures].[NumberOfSoldProductsIncreasing] ,[Measures].[NumberOfSoldProductsDecreasing] } ON 0 ,[Date].[Calendar].[Month] ON 1 FROM [Adventure Works];
When executed in MDX Studio, we get the following perfmon statistics:
Time : 171 ms
Calc covers : 15
Cells calculated : 1993
Sonar subcubes : 6
SE queries : 3
Cache hits : 7
Cache misses : 5
Cache inserts : 5
Cache lookups : 12
This is now much better than before, there are only 3 SE queries. All of the performance warnings have been addressed, but we still have warnings about CurrentMember not working well with multiselect. Indeed, when we put multiselect on product categories on Bikes and Accessories, we get errors for number of increasing and decreasing products. Links to “Writing multiselect friendly MDX calculations” and “Multiselect friendly MDX for calculations looking at current coordinate” blogs are helpful to understand what the problem is. Unfortunately, there is no universal solution, and each case needs to be evaluated differently.
In this scenario, we can use apply to Sum(Descendants(…), exp) very similar technique that we applied to Count(Descendants(…)). The idea is not to use Descendants function, which forces us to use CurrentMember, which in turn fails on multiselect, but to put the expression “exp” directly on the Product attribute. In order to do that, we need to create fake measure in the measure group - “Num Products Increased”. It is bound to NULL expression, so this measure will always be NULL, and therefore won’t take any space in storage. But since this measure will have aggregation function Sum, it will properly aggregate when assigned expressions inside MDX Script.
We will use MDX Studio to emulate MDX Script too (MDX Studio has a feature, where it automatically uses cube selected in the dropdown list to resolve current cube used in script statements).
Here is the MDX:
CREATE [Sales Growth] = IIF ( IsEmpty([Measures].[Internet Sales Amount]) ,NULL , [Measures].[Internet Sales Amount] - ( [Date].[Calendar].PrevMember ,[Measures].[Internet Sales Amount] ) );
( [Product].[Product Categories].[Product] ,[Measures].[Num Products Increased] ) = IIF ( [Measures].[Sales Growth] > 0 ,1 ,NULL );
SELECT { [Measures].[Number Of Products] ,[Measures].[Num Products Increased] } ON 0 ,[Date].[Calendar].[Month] ON 1 FROM [Adventure Works] WHERE { [Product].[Category].&[1] ,[Product].[Category].&[4] };
It’s performance characteristics are even slightly better than the query before:
Time : 31 ms
Calc covers : 6
Cells calculated : 76
Sonar subcubes : 1
SE queries : 1
Cache hits : 2
Cache misses : 0
Cache inserts : 0
Cache lookups : 2
Not to mention the fact that it works just fine with multiselect on products ! This is a great example, where fixing MDX to work properly in multiselect scenarios, has the side effect of improving performance.
We still have warnings about PrevMember with respect to multiselect on the Time dimension. It is more difficult problem, since the definition of “Sales Growth” when there is multiselect on the Time dimension is ambiguous.
Where to find BI blogs
Source: The sascom magazine blog [link]
I still owe you a running list of business intelligence and analytics blogs, but I’m throwing my hands up in defeat. There are too many, and the content is too varied to come up with a good, focused list. So, instead of listing individual bloggers, I’m going to list sites that aggregate or host multiple business intelligence blogs.
If you’re just starting to search out blogs in the BI space, consider this your starting point. Find a few bloggers you like at these sites, then check out their sidebars to see what other blogs they read. Before you know it, you’ll have your own list of BI blogs that’s too long to manage.
- BI Blogs: "Bringing together Business Intelligence voices from across the web." This site is like the reader’s digest of BI blogging. It aggregates content from almost 100 different BI blogs all around the blogosphere. You can read or subscribe to everyone’s blogs from the single stream or click through to discover dozens of new blogs. Fill in the contact form to be included.
- BeyeBLOGS: "Your voice in the business intelligence community." Part of the BeyeNETWORK, this blogging platform is open to anyone who wants to blog about business intelligence and related topics. Start by browsing the complete list of BeyeBLOGs, or sign up for your own blog for free.
- Intelligent Enterprise Weblogs: Their individual blogs are listed as channels. I tend to read a lot of Cindi Howson, Seth Grimes and Neil Raden.
- IT Toolbox BI blogs: "A professional blogging community of technology CEOs and executives, speakers, and industry gurus." I mentioned IT Toolbox in my list of BI communities. Their blog platform is also open to anyone who wants to blog, and I like that their blog listings identify inactive bloggers and popular posts. Start your own blog here.
As always, send me any additions you might have. Or take a minute to include your favorite BI blogs in the comments. I know there are a lot of good blogs that aren’t covered by the sites on this list.
Advantages/Disadvantages to Commercial Open Source
Source: Business Intelligence in Real Life [link]
I stumbled across a good article by the Business Intelligence and Performance Management Institute today that talks about Open Source Business Intelligence. While the information in it was very good, I have some things to point out.
First things first, here is what they outline as the advantages of Open Source BI
- Low Cost
- !supportLists]–>Quicker fixing of Bugs
- Easier to integrate with the other applications as the API used are not proprietary.
class=”MsoNormal”>Then the article goes on to list the disadvantages, this is the funny part…my comments are in green:
- Its end to end BI capabilities and robustness has not been proven over a time-span and across various organizational scenarios. (Actually, many companies use Open Source BI. BI is a very delicate subject for companies to talk publicly about because BI is almost always viewed as a strategic competitive advantage. Companies implement BI to do one of two things, lower costs or increase revenues, hence making them much more competitive. Think about this, if they can achieve this “competitive advantage” at about 20% the cost of proprietary BI. Not sure if you are familiar with game theory, but game theory basically says that as long as you mimic what your competitors are doing you will remain in the same spot, so if you are ahead and always mimic them strategically, you will always stay ahead, so on and so forth. Take this game theory and apply it to BI, if you can mimic how your competition is using BI, however, can do it at 20% of the cost, you will increase your gain on your competition. HMMM, no wonder companies don’t want to publicize what they are doing with Open Source BI.)
upportLists]–>- The ease of use of proprietary BI platforms, and user friendly documentation, though, open source has more comprehensive and transparent documentation (I’m confused, doesn’t your “though” comment nullify what you just said?)
!supportLists]–>- The Enterprise BI platform providers bring in a whole framework of BI implementation, which is a critical element of success. (Read my point for the first disadvantage)
/p>
Some companies are willing to share how they are using Open Source BI, for example Mozilla
Here is a link to the complete article:
http://www.bipminstitute.com/template/content_desc.php?content_id=867
Video Tutoriales de Pentaho Kettle
Source: Todo BI: Business Intelligence, Data Warehouse, CRM y mucho mas... [link]
Este verano… para no aburrirse los que tengan un poco de tiempo… viene con tutoriales y videos de ejemplo para seguir aprendiendo… en este caso del ETL de Pentaho.. Kettle
Acceder a Video Tutoriales
BI / DW Project Documentation - The Power of a Wiki
Source: Blogging about all things SAS [link]
Great post over on Hexware’s blog with recommendations on what you should be documenting as part of your BI / Data Warehouse project.
I know of a couple of SAS Data Warehouse sites that are using a Wiki as a collaborative environment for documentation and we are using Mediawiki on a SAS BI/DW project I am working on at the moment.
We are using the Wiki to document all our processes, development standards, business rules and content (when we get time of course).
One of the things I struggle with is how to integrate the technical metadata, that resides in the SAS Metadata Repository, with the business metadata we are storing in the Wiki.
To that end at sasInct we are developing a web service that will interogate the SAS Metadata Repository and then display the results via a Mediawiki plugin. The goal is to enable the technical metadata to be streamed to the Wiki in realtime and remove the need to copy and paste etc, or maintain two version.
We are also starting to make use of the description fields in Information Maps to provide business metadata to users when running a report, but we haven’t come up with an approach yet of how to populate that from the Wiki, or how to enable a user to click on an object in a report and open the relevant Wiki page with the business metadata.
Feel free to comment if you have done something cool in this space.
