BI Blogs

Bringing together Business Intelligence voices from across the web

How You Can Win in a Tougher Market?

Posted on the July 31st, 2008. Read 72 times

Source: Keep It Simple [link]


Not sure how I missed this Selling Power editorial from a couple of months ago. Thanks to our friends at CSO Insights for posting the link to it on their homepage.

The editorial by Gerhard Gschwandtner, publisher of Selling Power, provides an overview of the CSO Insights 2008 Sales Optimization Survey. He asks the following questions to VPs of Sales:

  • Who generates your leads?
  • What happens to the deals in your pipeline?
  • Do your salespeople research prospects before each call?
  • Do your salespeople sell value and avoid discounting?
  • Do you sweep failures under the rug?
  • Do you hire reps that succeed?
  • What’s your goal?

My favorite part of Gerhard’s overview is the following statement:

“The survey says 43.8 percent of the sales organizations need improvement in these areas. Your CRM solution isn’t going to help. What will help is a best-practice sharing system that harnesses the collective intelligence of your sales organization. The SavoGroup.com can connect all reps with the best practices of your top performers. LucidEra and Business Objects offer better analytics that drive better decisions.”

You can read the full article here and download the full survey here (registration required).

Share/Save/Bookmark

Error in SSIS with Oracle Date Columns

Posted on the July 31st, 2008. Read 91 times

Source: RDA Corp - Business Intelligence and SQL Server [link]


Just Some Excel Helpers and Respective Unit Tests

Posted on the July 31st, 2008. Read 75 times

Source: Loosely Coupled Human Code Factory [link]

Just some semi-elegant stuff I’ve conjured up recently to get column names in Excel and do some other functional things.  I wrote this code in a true TDD fashion.  It still, for some reason, seems odd to me to write tests in VB, since it was always a language that encouraged dirtiness.  But alas, this is fairly solid stuff.Imports NUnit.Framework

<TestFixture()> _
Public Class HelperTests
<Test()> <Category(”Presenter Helper Tests”)> _
Public Sub TestHelperGetExcelColumnLetterSingleLetter()…(read more)

DBLinks the sequel

Posted on the July 31st, 2008. Read 76 times

Source: Oracle Warehouse Builder (OWB) Weblog [link]


Wal-Mart chooses Oracle Business Intelligence

Posted on the July 31st, 2008. Read 132 times

Source: Frank Buytendijk Blog [link]


DATAllegro to move to Windows/SQL Server after Microsoft buyout

Posted on the July 31st, 2008. Read 94 times

Source: Blogging about all things SAS [link]

Carrying on from my earlier post, there is an  interesting post by the founder of DATAllegro on his blog here, stating that now Microsoft have purchased DATAllegro they will be migrating their appliance from Ingres/Linux to SQL Server/Windows.

Over on the Microsoft BI Blog there is  a post lauding how they have rave reviews for DATAllegro warehouses with data volumes over 200+ TB and amasing query response times.

So what will the move to Widows and SQL Server bring?

What will the migration path be like?

How will customers feel about the requirement to constantly apply windows patches to their “appliance”?

How many 200TB SQL Server/Windowswrehouse sites are out there at the moment?

Given the move from one technology stack to another is always fraught with difficult (just think SAS FM, SPM and ABM) then I would hate to be a DATAllegro customer with a 200TB warehouse at the moment.

Transit :: The Easy Way

Posted on the July 31st, 2008. Read 80 times

Source: Loosely Coupled Human Code Factory [link]

Yesterday I biked it to work as I have been for weeks now.  Except today I rode the old trusty Streetcar & #35 into the office.  On the way home I hit up the #35 as always, but transferred downtown to the MAX to PGE Park.  From there I just walked the rest of the way home, all of a mighty ten minutes. The MAX, was ridiculously packed by the time it reached Pioneer Square, and by the time it reached PGE Park barely anyone else could physically fit in the thing.  I’m glad they…(read more)

Using a DBLink Location for ETL

Posted on the July 30th, 2008. Read 97 times

Source: Oracle Warehouse Builder (OWB) Weblog [link]


User Stories

Posted on the July 30th, 2008. Read 81 times

Source: Loosely Coupled Human Code Factory [link]

The mumbo jumbo of use case, requirements gathering, and all that are a necessary evil of software design.  We developers cannot unfortunately just haphazardly build our UIs and other such things.  It is necessary that in some way the user interface is dictated by the users. In the past, especially with waterfall, this has been a massive failure point for companies having software built.  The UI not reflecting actual use or features of what was actually needed (vs. what is asked for,…(read more)

Five questions with Dwight Mouton

Posted on the July 30th, 2008. Read 73 times

Source: The sascom magazine blog [link]

Dwight Mouton is a Product Manager for SAS Americas.

What are you working on right now?
I’m quite busy right now. I’m finalizing requirements for Marketing Optimization version 5.3., settling in after a move from Winston Salem to Cary, and looking for customers who will discuss Marketing Optimization with me.

What are you reading?
I’m currently reading Lance Armstrong’s War by Daniel Coyle and The Sorcerer’s Stone from the Harry Potter series. Last month I read Competing on Analytics and Super Crunches while traveling. Now I’m trying to decide on my next business-related book – I usually alternate one fiction and one non-fiction. I also read the articles on SAS intranet daily.

What industry trends are you watching?
Since I spent 15 years in banking before coming to SAS in 2007, I’m usually following that industry pretty closely. I’m starting to add Telecom, Retail, and Insurance to my list because they are popular with MO.

What do you do to unwind after a long day’s work?
After a tough day, a good hour-long session of Grand Theft Auto gets my mind off things. To really decompress I love to go for a 30 minute run or an hour bike ride. Once I’m decompressed, hanging out with my family is one of my favorite pastimes.

What music do you like to listen to while exercising?
If I’m running I really enjoy the podcast entitled Podrunners which has different programs depending on your workout level. Since I recently started up running again, I’m on a training plan that will get me prepared for a competitive 5K in 8 weeks. When I’m working with weights I prefer something hardcore; Metallica is my favorite by far.

Data quality: definiciones

Posted on the July 30th, 2008. Read 72 times

Source: Information Management [link]

A

Flood Gates on Operational BI on the Verge Of Breaking Wide Open

Posted on the July 30th, 2008. Read 168 times

Source: Blog: Mike Ferguson [link]

It seems that everywhere I look the number of vendors gearing up for Operational BI is just massing as if awaiting an onslaught on the market. Among them include HP, IBM, InforSense, Progress, Tibco, SL, ThinkAnalytics and many more. Key to this is event driven data integration, in-memory data, predictive analytics and rules engines. Vendors like SL have even released a data cache for in-memory analytics with its RTView product. Progress is also pushing with its Aparma product and long term pioneer ThinkAnalytics are also doing well. Clearly the giants are also moving to get all their pieces in play. IBM’s acquisition of German rules vendor iLOG looks like it could also be used in the world of on-demand and automated decisioning. Also SAP Business Objects Labs have released a prototype on event-driven BI and seem to be partnering with SPSS on predictive analytics.

It’s all heating up, yet in the UK it staggers me the number of companies that are not seeing the benefit of these emerging technologies. Some verticals would reap massive benefits by exploiting this technology. If you are interested in Operational BI drop me an email at ukinfo@intelligentbusiness.biz or check out some of my articles on this either on my B-Eye-Network channel or on my web site http://www.intelligentbusiness.biz/ArticlesSignUpPage.html and http://www.intelligentbusiness.biz/ArchiveArticlesFront.html

Aggregate Facts and Dimensions

Posted on the July 30th, 2008. Read 92 times

Source: Tod means Fox [link]

Aggregate — picture from http://www.stonetohome.comAggregate fact tables are special fact tables in a data warehouse that contain new metrics derived from one or more aggregate functions (AVERAGE, COUNT, MIN, MAX, etc..) or from other specialized functions that output totals derived from a grouping of the base data. These new metrics, called “aggregate facts” or “summary statistics” are stored and maintained in the data warehouse database in special fact tables at the grain of the aggregation. Likewise, the corresponding dimensions are rolled up and condensed to match the new grain of the fact.

These specialized tables are used as substitutes whenever possible for returning user queries. The reason? Speed. Querying a tidy aggregate table is much faster and uses much less disk I/O than the base, atomic fact table, especially if the dimensions are large as well. If you want to wow your users, start adding aggregates. You can even use this “trick” in your operational systems to serve as a foundation for operational reports. I’ve always done this for any report referred to by my users as a “Summary”. (As an aside, there is a difference between an “aggregate” and a “summary”. I’ll explore these differences in my next post.)

For example, take the “Orders” business process from an online catalog company where you might have customer orders in a fact table called FactOrders with dimensions Customer, Product, and OrderDate. With possibly millions of orders in the transaction fact, it makes sense to start thinking about aggregates.

To further the above example, assume that the business is interested in a report: “Monthly orders by state and product type”. While you could generate this easily enough using the FactOrders fact table, you could likely speed up the data retrieval for the report by at least half (but likely much, much more) using an aggregate.

Here, using the atomic transaction FactOrders table:

SELECT c.STATE, p.product_type, t.YEAR, t.MONTH, SUM(f.order_amount)
    FROM FactOrders f
    JOIN DimCustomer c ON c.CustomerID = f.CustomerID
    JOIN DimProduct p ON p.ProductID = f.ProductID
    JOIN DimTime t ON
        t.YEAR = DATEPART(yy, f.YearMonthID) AND
        t.MONTH = DATEPART(mm, f.DateID)
    GROUP BY c.STATE, p.product_type, t.YEAR, t.MONTH

The aggregate is querying much less data and queries against time are now much simpler. In my non-scientific tests, the following query ran many times faster (a few seconds compared to about 30 seconds!).

SELECT c.STATE, p.product_type, t.YEAR, t.MONTH, SUM(f.order_amount)
    FROM FactOrders_Agg1 f
    JOIN DimCustomerState c ON c.CustomerStateID = f.CustomerStateID
    JOIN DimProductType p ON p.ProductTypeID = f.ProductTypeID
    JOIN DimMonth t ON t.YearMonthID = f.YearMonthID
    GROUP BY c.STATE, p.product_type, t.YEAR, t.MONTH

Creating the Fact and Dimensions

To implement, you will need to roll up your fact table by the hierarchies found in your dimensions. The result will be a new fact table, a set of new accompanying dimensions at the grain of the fact, and all new foreign keys for mapping. I usually name the fact table the same as the base fact with some meaningful suffix appended to the end. In SSMS, this keeps the aggregates with the fact in my object explorer. Dimensions usually get new names (like CustomerState and ProductType) and should be conformed so that they can be reused across business processes. You could even create views instead of new dimensions, but this does not eliminate the need to regenerate new surrogate keys.

When rolling up dimensions, you are provided with an excellent opportunity to perform aggregate functions on the dimension itself and store the results as new attributes. For example, you may want to know how many customers are living in each state. This could be used as the denominator in some population calculation you plan to use against the aggregate fact. Your new dimension might therefore look like the following:

SELECT Cust_Country, Cust_Region, Cust_State, COUNT(Cust_ID)
FROM DimCustomer
GROUP BY Cust_Country, Cust_Region, Cust_State

The most obvious aggregate function to use is COUNT, but depending on the type of data you have in your dimensions, other functions may prove useful. Just be warned: If you find that aggregate functions are being used a lot on your dimensions, you may need to revisit your design. There may be opportunities to pull out those metrics into existing or new fact tables!

Generating aggregates is largely an incremental process, where you examine query and reporting usage looking for places to improve performance. Aggregates stored in the RDBMS are maintained through ETL and/or your OLAP engine.

A Note About OLAP Aggregates

Theoretically, storing aggregates in a fact table in a RDBMS is the same as storing them in an OLAP cube. In OLAP storage, aggregates are precalculated summaries of data from the different dimensions of the cube, such that a query that seeks to know the aggregate (sum) of some metric (order amount) for X (customer state) and Y (product type) over T (monthly orders) would only need to look inside the cube at those exact coordinates to get the answer. I won’t pretend to know how this stuff is physically stored, but OLAP engines across the board offer better performance, management, and navigation mechanisms for the aggregations than is available through the RDBMS (even when using Indexed or Materialized Views).

Next post, I’ll write some thoughts on the differences between “Summaries” and “Aggregates”!

Many-to-Many Dimension bug

Posted on the July 30th, 2008. Read 64 times

Source: Chris Webb's BI Blog [link]

Jon Axon mailed me recently with an interesting bug he’d come across concerning many-to-many dimensions, where it looks like the AS engine is trying to be a bit too clever for its own good as far as query optimisation goes. It’s reproducible on AS2005 SP2 and Katmai RC0, and I logged it on Connect here: https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=357469

Here’s an example on Adventure Works that Jon gave me. The following query:

SELECT
[Measures].[Customer Count] on 0
FROM [Adventure Works]

Shows the total number of customers as being 18484. Looking at the Sales Reason dimension (which has a many-to-many relationship with the measure group Customer Count is from), if you run a relational query on the underlying data source you can see that not every order was associated with a sales reason - this is the key point here. Now if we run the query:

SELECT
[Measures].[Customer Count] on 0
FROM [Adventure Works]
WHERE [Sales Reason].[Sales Reason].[All Sales Reasons]

We can see that it returns the same result as the first query, 18484, as you would expect. But what if we want to find the number of customers who specified a sales reason? You would think that the following query would do that:

SELECT
[Measures].[Customer Count] on 0
FROM [Adventure Works]
WHERE [Sales Reason].[Sales Reason].[All Sales Reasons].Children

But it doesn’t, it returns 18,484 again. However if you run this query which should be equivalent to the previous query:

SELECT
[Measures].[Customer Count] on 0
FROM [Adventure Works]
WHERE {[Sales Reason].[Sales Reason].&[1],
       [Sales Reason].[Sales Reason].&[2],
       [Sales Reason].[Sales Reason].&[3],
       [Sales Reason].[Sales Reason].&[4],
       [Sales Reason].[Sales Reason].&[5],
       [Sales Reason].[Sales Reason].&[6],
       [Sales Reason].[Sales Reason].&[7],
       [Sales Reason].[Sales Reason].&[8],
       [Sales Reason].[Sales Reason].&[9],
       [Sales Reason].[Sales Reason].&[10]}

It returns the value we’re looking for, 17022. Not good! What I suspect is happening here is that when the query optimiser sees the expression [Sales Reason].[Sales Reason].[All Sales Reasons].Children in the Where clause, it assumes that it’s equivalent to [Sales Reason].[Sales Reason].[All Sales Reasons] which it is in most cases, just not here. Interestingly if you look in Profiler and run the queries on a cold cache, you see exactly the same activity when [Sales Reason].[Sales Reason].[All Sales Reasons].Children is in the Where clause as when [Sales Reason].[Sales Reason].[All Sales Reasons] is there; but when you explicitly list all of the individual sales reasons, as in the last query, because Sales Reason has to resolve itself through the Internet Sales Order Details dimension which is ROLAP, you see SQL queries being fired off against the relational database.

Scripts for loading up DBGEN TPC-H data

Posted on the July 30th, 2008. Read 79 times

Source: oramoss oracle [link]

If you’re interested in creating a TPC-H schema for testing purposes, then the following scripts may be of use to you:

Unix Scripts:
Multi Load TPCH
Load TPCH Stream

SQL*Loader Control files:
REGION Table
NATION Table
SUPPLIER Table
CUSTOMER Table
PART Table
PARTSUPP Table
ORDERS Table
LINEITEM Table

You may wish to read and check them before you use them - they’re not exactly rocket science but they seem to do the job.

I have all the files in the same directory for simplicity sake.

I then use them to create a scale factor 1 target TPC-H schema using the following calls:

# “l” loads the REGION and NATION tables
./multi_load_tpch.sh 1 l “tpch/tpch@test” 1

# “s” loads the SUPPLIER table
./multi_load_tpch.sh 1 s “tpch/tpch@test” 10

# “c” loads the CUSTOMER table
./multi_load_tpch.sh 1 c “tpch/tpch@test” 10

# “p” loads the PART and PARTSUPP tables
./multi_load_tpch.sh 1 p “tpch/tpch@test” 10

# “o” loads the ORDERS and LINEITEM tables
./multi_load_tpch.sh 1 o “tpch/tpch@test” 10

Obviously, you need to change the connection string to match your environment.

Caveats:

  1. Obviously, they are supplied as is - use at your own discretion and risk.
  2. You need to have created the target schema tables and made sure they are empty as the SQL*Loader control files use APPEND.
  3. Bear in mind that choosing too high a number of parallel streams (the last parameter in the calls) will overload your machine so try and balance it against the available system resources.

Bugs, issues or questions, please get in touch…enjoy.

Product Volatility: Optimizing MDX with MDX Studio

Posted on the July 29th, 2008. Read 79 times

Source: Microsoft OLAP by Mosha Pasumansky [link]

In this article we will show how to take typical, everyday MDX and optimize it by simply following advices of MDX Studio Analyzer.

As a starting point, I will take “Product Volatility” example as published by Thomas Ivarson in his blog. It is a perfect example, because Thomas took a real business problem, and implemented it using very clean and straightforward MDX, just like I expect many MDX practitioners do. This MDX can be optimized, but some of these optimizations might not be obvious. Running MDX Studio Analyzer and following its advices takes the guesswork out of equation.

Here is the MDX that computes total number of products, number of products with sales increased from last month and number of products with sales decreased from last month:

WITH
  MEMBER Measures.TotNumberOfProducts AS
    Count
    (
      Descendants
      (
        [Product].[Product Categories].CurrentMember
       ,[Product].[Product Categories].[Product]
      )
    )
  MEMBER Measures.NumberOfSoldProductsIncreasing AS
    Count
    (
      Filter
      (
        NonEmpty
        (
          Descendants
          (
            [Product].[Product Categories].CurrentMember
           ,[Product].[Product Categories].[Product]
          )
         ,[Measures].[Internet Sales Amount]
        )
       ,
            (
              [Date].[Calendar].CurrentMember
             ,[Measures].[Internet Sales Amount]
            )
          -
            (
              [Date].[Calendar].PrevMember
             ,[Measures].[Internet Sales Amount]
            )
        > 0
      )
    )
  MEMBER Measures.NumberOfSoldProductsDecreasing AS
    Count
    (
      Filter
      (
        NonEmpty
        (
          Descendants
          (
            [Product].[Product Categories].CurrentMember
           ,[Product].[Product Categories].[Product]
          )
         ,[Measures].[Internet Sales Amount]
        )
       ,
            (
              [Date].[Calendar].CurrentMember
             ,[Measures].[Internet Sales Amount]
            )
          -
            (
              [Date].[Calendar].PrevMember
             ,[Measures].[Internet Sales Amount]
            )
        < 0
      )
    )
SELECT
  {
    [Measures].[TotNumberOfProducts]
   ,[Measures].[NumberOfSoldProductsIncreasing]
   ,[Measures].[NumberOfSoldProductsDecreasing]
  } ON 0
 ,[Date].[Calendar].[Month] ON 1
FROM [Adventure Works]

Executing this query with hot cache in MDX Studio produces the following perfmon counter statistics:

Time             : 359 ms

Calc covers      : 85

Cells calculated : 7672

Sonar subcubes   : 79

SE queries       : 76

Cache hits       : 76

Cache misses     : 3

Cache inserts    : 3

Cache lookups    : 79

The time looks good, but only because both the cube and the query are small. The number of SE queries should raise a red flag here. With only 114 cells in the result, why there were 76 SE queries ? Obviously, on more serious cubes, this will become a major performance issue. So, let’s run MDX Studio Analyzer and see what it has to say about this query. We get back the following list:

Line Col Message Link
3 6 If you are trying to count number of members in current selection - consider introducing special measure group for this dimension More Info
7 41 MDX function ‘CurrentMember’ may raise an error or produce non-desired result when user applies multiselect More Info
12 6 Consider rewriting Count(Filter(set, cond)) construct as Sum(set, summator) with summator IIF(cond, 1, NULL) More Info
20 45 MDX function ‘CurrentMember’ may raise an error or produce non-desired result when user applies multiselect More Info
27 34 MDX function ‘CurrentMember’ may raise an error or produce non-desired result when user applies multiselect More Info
32 34 MDX function ‘PrevMember’ may raise an error or produce non-desired result when user applies multiselect More Info
39 6 Consider rewriting Count(Filter(set, cond)) construct as Sum(set, summator) with summator IIF(cond, 1, NULL) More Info
47 45 MDX function ‘CurrentMember’ may raise an error or produce non-desired result when user applies multiselect More Info
53 14 Same expression was used before at Line 26 Column 14. Consider eliminating common subexpressions for better performance and to take advantage of cache  

Let’s go over this list and fix issue by issue.

Line 3, Col 6: If you are trying to count number of members in current selection - consider introducing special measure group for this dimension

The link points to “Counting days in MDX” blog. While we count here Products and not Days, the same technique applies here. These types of questions are best done not in MDX, but by introducing measure group, which only contains Products dimension, and single measure [Number Of Products] of type Count.

Line 7, Col 41: MDX function ‘CurrentMember’ may raise an error or produce non-desired result when user applies multiselect

Now, this is no longer a problem because of how we solved the previous warning ! Indeed, if we wanted to see total number of products across Bikes and Accessories categories, we would’ve added the following WHERE clause:

WHERE {[Product].[Category].&[1],[Product].[Category].&[4]}

It fails with the original query, but works fine with Count measure [Number of Products]

Lines 12/39, Col 6: Consider rewriting Count(Filter(set, cond)) construct as Sum(set, summator) with summator IIF(cond, 1, NULL)

This is a common construct, and as shown in the link to the “Optimizing Count(Filter(…)) expressions in MDX”, it performs much better if rewritten to use Sum. But before we do this rewrite, let’s take a look at the next warning:

Line 53, Col 14: Same expression was used before at Line 26 Column 14. Consider eliminating common subexpressions for better performance and to take advantage of cache

If we click on this message, the expression in question is the following:

            (
              [Date].[Calendar].CurrentMember
             ,[Measures].[Internet Sales Amount]
            )
          -
            (
              [Date].[Calendar].PrevMember
             ,[Measures].[Internet Sales Amount]
            )

Indeed, this expression represents Sales Growth, so we can isolate it into separate calculated member. But before we write the exact expression for the calculated member, let’s look deeper into the original expression. It features Count(Filter(NonEmpty(…))) construct, i.e. only products which had sales in current month are considered for increasing/decreasing calculation. I.e. even if product had sales in the previous month, but not in the current month, it won’t be counted in the number of products with decreasing sales. In order to repeat the same logic, the expression would be

  [Sales Growth] =
    IIF
    (
      IsEmpty([Measures].[Internet Sales Amount])
     ,NULL
     ,
        [Measures].[Internet Sales Amount]
      -
        (
          [Date].[Calendar].PrevMember
         ,[Measures].[Internet Sales Amount]
        )
    ) 

For more details why IIF(cond, NULL, exp) is good for performance, read “Performance of IIF function in MDX” blog. Now, putting all of together so far, we will get the following query:

WITH
  MEMBER [Sales Growth] AS
    IIF
    (
      IsEmpty([Measures].[Internet Sales Amount])
     ,NULL
     ,
        [Measures].[Internet Sales Amount]
      -
        (
          [Date].[Calendar].PrevMember
         ,[Measures].[Internet Sales Amount]
        )
    )
  MEMBER PositiveGrowth AS
    IIF
    (
      [Measures].[Sales Growth] > 0
     ,1
     ,NULL
    )
  MEMBER Measures.NumberOfSoldProductsIncreasing AS
    Sum
    (
      Descendants
      (
        [Product].[Product Categories].CurrentMember
       ,[Product].[Product Categories].[Product]
      )
     ,[Measures].[PositiveGrowth]
    )
  MEMBER NegativeGrowth AS
    IIF
    (
      [Measures].[Sales Growth] < 0
     ,1
     ,NULL
    )
  MEMBER Measures.NumberOfSoldProductsDecreasing AS
    Sum
    (
      Descendants
      (
        [Product].[Product Categories].CurrentMember
       ,[Product].[Product Categories].[Product]
      )
     ,[Measures].[NegativeGrowth]
    )
SELECT
  {
    [Measures].[Number Of Products]
   ,[Measures].[NumberOfSoldProductsIncreasing]
   ,[Measures].[NumberOfSoldProductsDecreasing]
  } ON 0
 ,[Date].[Calendar].[Month] ON 1
FROM [Adventure Works];

When executed in MDX Studio, we get the following perfmon statistics:

Time             : 171 ms

Calc covers      : 15

Cells calculated : 1993

Sonar subcubes   : 6

SE queries       : 3

Cache hits       : 7

Cache misses     : 5

Cache inserts    : 5

Cache lookups    : 12

This is now much better than before, there are only 3 SE queries. All of the performance warnings have been addressed, but we still have warnings about CurrentMember not working well with multiselect. Indeed, when we put multiselect on product categories on Bikes and Accessories, we get errors for number of increasing and decreasing products. Links to “Writing multiselect friendly MDX calculations” and “Multiselect friendly MDX for calculations looking at current coordinate” blogs are helpful to understand what the problem is. Unfortunately, there is no universal solution, and each case needs to be evaluated differently.

In this scenario, we can use apply to Sum(Descendants(…), exp) very similar technique that we applied to Count(Descendants(…)). The idea is not to use Descendants function, which forces us to use CurrentMember, which in turn fails on multiselect, but to put the expression “exp” directly on the Product attribute. In order to do that, we need to create fake measure in the measure group - “Num Products Increased”. It is bound to NULL expression, so this measure will always be NULL, and therefore won’t take any space in storage. But since this measure will have aggregation function Sum, it will properly aggregate when assigned expressions inside MDX Script.

We will use MDX Studio to emulate MDX Script too (MDX Studio has a feature, where it automatically uses cube selected in the dropdown list to resolve current cube used in script statements).

Here is the MDX:

CREATE
  [Sales Growth] =
    IIF
    (
      IsEmpty([Measures].[Internet Sales Amount])
     ,NULL
     ,
        [Measures].[Internet Sales Amount]
      -
        (
          [Date].[Calendar].PrevMember
         ,[Measures].[Internet Sales Amount]
        )
    );
(
  [Product].[Product Categories].[Product]
 ,[Measures].[Num Products Increased]
) =
  IIF
  (
    [Measures].[Sales Growth] > 0
   ,1
   ,NULL
  );
SELECT
  {
    [Measures].[Number Of Products]
   ,[Measures].[Num Products Increased]
  } ON 0
 ,[Date].[Calendar].[Month] ON 1
FROM [Adventure Works]
WHERE
  {
    [Product].[Category].&[1]
   ,[Product].[Category].&[4]
  };

It’s performance characteristics are even slightly better than the query before:

Time             : 31 ms

Calc covers      : 6

Cells calculated : 76

Sonar subcubes   : 1

SE queries       : 1

Cache hits       : 2

Cache misses     : 0

Cache inserts    : 0

Cache lookups    : 2

Not to mention the fact that it works just fine with multiselect on products ! This is a great example, where fixing MDX to work properly in multiselect scenarios, has the side effect of improving performance.

We still have warnings about PrevMember with respect to multiselect on the Time dimension. It is more difficult problem, since the definition of “Sales Growth” when there is multiselect on the Time dimension is ambiguous.

Where to find BI blogs

Posted on the July 29th, 2008. Read 93 times

Source: The sascom magazine blog [link]

I still owe you a running list of business intelligence and analytics blogs, but I’m throwing my hands up in defeat. There are too many, and the content is too varied to come up with a good, focused list. So, instead of listing individual bloggers, I’m going to list sites that aggregate or host multiple business intelligence blogs.

If you’re just starting to search out blogs in the BI space, consider this your starting point. Find a few bloggers you like at these sites, then check out their sidebars to see what other blogs they read. Before you know it, you’ll have your own list of BI blogs that’s too long to manage.

  • BI Blogs: "Bringing together Business Intelligence voices from across the web." This site is like the reader’s digest of BI blogging. It aggregates content from almost 100 different BI blogs all around the blogosphere. You can read or subscribe to everyone’s blogs from the single stream or click through to discover dozens of new blogs. Fill in the contact form to be included. 
  • BeyeBLOGS: "Your voice in the business intelligence community." Part of the BeyeNETWORK, this blogging platform is open to anyone who wants to blog about business intelligence and related topics. Start by browsing the complete list of BeyeBLOGs, or sign up for your own blog for free.
  • Intelligent Enterprise Weblogs: Their individual blogs are listed as channels. I tend to read a lot of Cindi Howson, Seth Grimes and Neil Raden
  • IT Toolbox BI blogs: "A professional blogging community of technology CEOs and executives, speakers, and industry gurus." I mentioned IT Toolbox in my list of BI communities. Their blog platform is also open to anyone who wants to blog, and I like that their blog listings identify inactive bloggers and popular posts. Start your own blog here. 

As always, send me any additions you might have. Or take a minute to include your favorite BI blogs in the comments. I know there are a lot of good blogs that aren’t covered by the sites on this list. 

Advantages/Disadvantages to Commercial Open Source

Posted on the July 29th, 2008. Read 102 times

Source: Business Intelligence in Real Life [link]

I stumbled across a good article by the Business Intelligence and Performance Management Institute today that talks about Open Source Business Intelligence. While the information in it was very good, I have some things to point out.

First things first, here is what they outline as the advantages of Open Source BI

  • Low Cost
  • !supportLists]–>Quicker fixing of Bugs
  • Easier to integrate with the other applications as the API used are not proprietary.

class=”MsoNormal”>Then the article goes on to list the disadvantages, this is the funny part…my comments are in green:

- Its end to end BI capabilities and robustness has not been proven over a time-span and across various organizational scenarios. (Actually, many companies use Open Source BI. BI is a very delicate subject for companies to talk publicly about because BI is almost always viewed as a strategic competitive advantage. Companies implement BI to do one of two things, lower costs or increase revenues, hence making them much more competitive. Think about this, if they can achieve this “competitive advantage” at about 20% the cost of proprietary BI. Not sure if you are familiar with game theory, but game theory basically says that as long as you mimic what your competitors are doing you will remain in the same spot, so if you are ahead and always mimic them strategically, you will always stay ahead, so on and so forth. Take this game theory and apply it to BI, if you can mimic how your competition is using BI, however, can do it at 20% of the cost, you will increase your gain on your competition. HMMM, no wonder companies don’t want to publicize what they are doing with Open Source BI.)

upportLists]–>- The ease of use of proprietary BI platforms, and user friendly documentation, though, open source has more comprehensive and transparent documentation (I’m confused, doesn’t your “though” comment nullify what you just said?)

!supportLists]–>- The Enterprise BI platform providers bring in a whole framework of BI implementation, which is a critical element of success. (Read my point for the first disadvantage)

/p>

Some companies are willing to share how they are using Open Source BI, for example Mozilla

Here is a link to the complete article:

http://www.bipminstitute.com/template/content_desc.php?content_id=867

Video Tutoriales de Pentaho Kettle

Posted on the July 29th, 2008. Read 85 times

Source: Todo BI: Business Intelligence, Data Warehouse, CRM y mucho mas... [link]

Este verano… para no aburrirse los que tengan un poco de tiempo… viene con tutoriales y videos de ejemplo para seguir aprendiendo… en este caso del ETL de Pentaho.. Kettle

Acceder a Video Tutoriales

Kettle

BI / DW Project Documentation - The Power of a Wiki

Posted on the July 29th, 2008. Read 109 times

Source: Blogging about all things SAS [link]

Great post over on Hexware’s blog  with recommendations on what you should be documenting as part of your BI / Data Warehouse project.

I know of a couple of SAS Data Warehouse sites that are using a Wiki as a collaborative environment for documentation and we are using Mediawiki on a SAS BI/DW project I am working on at the moment.

We are using the Wiki to document all our processes, development standards, business rules and content (when we get time of course).

One of the things I struggle with is how to integrate the technical metadata, that resides in the SAS Metadata Repository, with the business metadata we are storing in the Wiki.

To that end at sasInct we are developing a web service that will interogate the SAS Metadata Repository and then display the results via a Mediawiki plugin.  The goal is to enable the technical metadata to be streamed to the Wiki in realtime and remove the need to copy and paste etc, or maintain two version.

We are also starting to make use of the description fields in Information Maps to provide business metadata to users when running a report, but we haven’t come up with an approach yet of how to populate that from the Wiki, or how to enable a user to click on an object in a report and open the relevant Wiki page with the business metadata.

Feel free to comment if you have done something cool in this space.

Next Page »