BI Blogs

Bringing together Business Intelligence voices from across the web

15 cuestiones que deberias preguntarte antes de construir un Cuadro de Mando

Posted on the July 31st, 2007. Read times

Source: Todo BI: Business Intelligence, Data Warehouse, CRM y mucho mas... [link]

Cuadros de Mando

Interesantisimo articulo de Rabi Islam, responsable técnico de Dundas (que probablemente sea la compañía que lleva la espectacularidad de los cuadros de mando a su punto mas alto).

Esto es lo que nos cuenta, y en lo que no podemos estar mas de acuerdo:

1. ¿Para que me servirá el Cuadro de Mando?
2. ¿Cuál es el público objetivo del Cuadro de Mando?
3. ¿El mismo informe debe mostrar vistas customizadas en función del usuario?
4. ¿Qué estamos midiendo?
5. ¿Qué acciones queremos tomar para cada informe concreto?
6. ¿Quién es el reponsable de gestionar las tareas, eventos y procesos?
7. ¿Cada inform o vista se complementa con datos y series históricas?
8. ¿Estan las variables claramente definidas, incluyendo agregadas, ratios y formulas?
9. ¿Qué tipos de elementos -tablas, graficos, mapas- contendrá el Cuadro de Mando?
10. ¿Cuál es la tecnología que mejor se ajusta a tus requerimientos?
11. ¿Deberiamos mostrar los flujos de procesos en sus correspondientes vistas?
12. ¿Refleja el Cuadro de Mando la ‘foto completa’ de nuestro negocio?
13. ¿Incluye el Cuadro de Mando los elementos visuales y de texto que expliquen el mismo?
14. ¿Tiene mi Cuadro de Mando un aspecto visual llamativo y acorde a lo que quiero mostrar?
15 ¿Ha decidido un ‘template’ o diseño común a todos sus Cuadros de Mando?

Sin duda, unos consejos que deberiamos seguir siempre.

Flaking it - there’s more to life than stars

Posted on the July 31st, 2007. Read times

Source: Pete-s random notes [link]

Conventional wisdom says that users think dimensionally and fully denomalised (Star schemas) fit well with both query tools and user thinking. But sometimes something can be said about moving to a StarFlake design especially if allows attributes to be hidden from groups of users who should not see such things

Panorama: One Year On

Posted on the July 31st, 2007. Read times

Source: Chris Webb's BI Blog [link]

After the Proclarity acquisition last year many people, myself included, thought the writing was on the wall for Panorama: after all the two companies’ product lines were very similar and if Proclarity was going to be supplied by Microsoft then there wasn’t going to be much point in looking at Panorama any more. However, one of the things that surprised me at the Microsoft BI Conference a few months back was the big display that Panorama put on. A big stand in a central location with leggy girls luring the attendees (at least the male ones) to it, a party at the top of the Space Needle - this was a company that was not dead, or at least making a pretty big effort to show that it wasn’t dead.

Of course the real test is whether the software is any good or not. In keeping with their bullish mood Panorama were handing out DVDs with their Novaview suite pre-installed on a virtual machine (you can download it here: http://challenge.panorama.com/forms/default.aspx) and since it was a while since I’d last seen what they’d got to offer (and after some encouragement from their marketing department) I was curious to check it out. After a lot of false starts surrounding licence keys etc and help from Panorama - and I have to say that in my experience their pre-sales support has always been very good - I got it running on my laptop and thought I’d share my impressions here.

I have to admit I was disappointed with the Novaview Desktop tool at first: it didn’t look as if anything had changed in the last few years. The UI looks very outdated in a VB6 way and while it’s easier to use than I remember, I’m still not convinced that it’s as intuitive as it should be. Even worse, within a few minutes of using it I got one of those unhandled (although unfatal) error messages that Panorama was always famous for - which simply isn’t acceptable today. These might be purely cosmetic points but users are more interested in this type of thing than some obscure aspect of MDX generation. After a bit more time though I felt more positive. For the power user it does pretty much everything that you’d ever want it to do, such as enter your own MDX, do writeback, advanced filtering, creating calculations; in fact it clearly does what every demanding customer has ever wanted it to do so it’s more than likely able to meet any obscure querying requirements you have.

I took my misgivings back to Panorama and they told me that the desktop client will be dropped in the next release, due Q1 2008, and from that point there will only be an AJAX-based zero-footprint client and a rich client based on Adobe Flex. It’s interesting that the latter is not based on .NET or even Silverlight; not only a move away from the desktop, which is only to be expected, but a move away from the Microsoft dev platform which I suppose makes sense given Panorama’s repositioning of itself away from being a purely Microsoft partner to working with SAP as well. Presumably it will have much the same capabilities as the current desktop client but look rather better… 

In the meantime, the comparison between Novaview Desktop and ProClarity Desktop is one that has been made a lot in the past and is probably worth making again. In terms of querying functionality they’re neck-and-neck but in my opinion Proclarity is easier to use and looks marginally better, although it is still falls way short of what I’d expect from a modern BI tool. The problem with the Proclarity Desktop tool is that is has it has been declared dead by Microsoft: the last official release was made late last year and it seems there’s no place for a rich client in the bright shiny web-enabled PerformancePoint future. I’ve heard various rumours about it such as that it was going to be turned into an Excel addin or, more recently that it was going to be rewritten in .NET, but that lack of any clear direction from Microsoft on its future is a bit frustrating. I suspect that Microsoft have decided that Excel is the only desktop tool anyone is going to need, but I think there’s still a role for a dedicated client for power users and I know a lot of other Proclarity users feel the same way. If you’re in the market for a full-featured AS desktop tool then there’s little point in going with Proclarity Desktop now, so Novaview wins by default given that there’s a clear roadmap for its future.

As far as the web-based querying and dashboarding functionality goes there is clear competition with what Panorama and PerformancePoint have to offer. A lot of companies will simply go with the Microsoft offering simply because it’s from Microsoft, and that’s a perfectly valid decision; anyway, given the lineage of the product it’s going to be a lot better than a typical Microsoft version 1.0. Why choose Panorama then? Panorama claim to be a lot more scaleable on the web than the old Proclarity equivalents; it’s probably too early to tell whether the same will be true of PerformancePoint though. The cross-platform capabilities will probably be the key: from what I’ve seen in my Analysis Services consultancy work, a lot of companies using AS also use SAP BW and a common front-end for both could be an enticing prospect. I’m also told that Panorama will be building functionality behind PerformancePoint to enable import of data from other platforms and in front of it to enable integration with other applications and business processes - the kind of value-add stuff that Microsoft can’t offer because of its longer release cycles and limitations on who they can partner with.

So, it looks like Panorama have got a future after all. Just as the Microsoft’s entry into the OLAP server market didn’t lead to the immediate bloodbath among rival OLAP vendors that many predicted, so its entry into the client tool market hasn’t (yet) killed off the old third-party client tools market. The delay in getting PerformancePoint released after the Proclarity acquisition and the fact that most companies are a long way from rolling Office 2007 onto their desktops means that companies like Panorama have had a chance to work out a survival strategy - and the choice this means can only be a good thing for us end users.

Multiple Stack Complexity Vs SOA Simplicity

Posted on the July 31st, 2007. Read times

Source: Blog: Mike Ferguson [link]

Over the last several months in customer assignments associated with deployment of BI web services and Data Integration web services I keep running into the same problem that everone seems to be struggling with when it comes to implementing a Service Oriented Architecture.

That problem is one of multiple integration stacks. As I consult on a Europe wide basis I find this problem everywhere. Companies want to standardise on a single common set of infrastructure software which includes common BI infrastructure (gradual evolution to common BI platform) and common Business Integration infrastructure (portal, process management, ESB, ESR, single sign-on etc.). Yet as they buy packaged applications particularly from SAP and Oracle and upgrade to Office 2007 only to find SharePoint coming in though another door, they find themselves with multiple software infrastructure stacks in the enterprise and additional complexity they had not planned for. Plugging together multiple stacks was not on the original agenda. Do you recognise this porblem?

It seems strange that the attraction of SOA is simplicity and flexibility and yet as you try to get closer to that you end up with more complexity. BI and data integration services in a SOA are much sought after but I found myself recently having to deal with questions like “Which enterprise servive registry should be use for managing BI services?” and also “should be define our business processes on our IBM WebSphere BPM software or the SAP Netweaver BPM software or the Oracle Fusion one?”.

There are many more of these kinds of questions that companies are struggling with at present. It seams that vendors are forcing additional infrastructure on enteprises rather than offering flexibility to run on the infrastructure of choice that the customer wants. Yet the business case to executives is that SOA is needed because it facilitates consolidatation of IT infrastructure to reduce complexity and total cost of ownership. As one executive said to me recently “I am struggling to see the benefit”. I must admit that I found it difficult to disagree with him

Opps.

Posted on the July 30th, 2007. Read times

Source: Pete-s random notes [link]

Foolishly I deleted my pending mail file - so if I owe you email then mail me again.

Executive Dashboards & Presentation-class Reports

Posted on the July 30th, 2007. Read times

Source: Calumo - Business Intelligence and Performance Management Blog [link]

Just for fun I created the AFL ladder below using CALUMO to illustrate two of our
business intelligent visualization widgets.

In the first column of the table, we’re using a win/loss bar chart to show each team’s
record for the 17 rounds of the 2007 AFL season. A blue bar represents a win, a red
bar a loss and a blank bar a draw. With 16 teams and 17 rounds played, the bar chart
effectively shows the visual equivalent of 272 data points, maximizing data density
without compromising readability or ease of use.

Let’s take a look at these results: Geelong took 5 rounds to warm up, but definitely
seem to be on a winning roll now. Collingwood & Fremantle both seem to have a
“win at home, loose away from home” problem. After winning six games straight, West
Coast started paying the price for not being able to play their best football.

CALUMO Visualization Widgets provide an intelligent visual way of presenting large
amounts of meaningful data for identifying trends, comparing data and highlighting
exceptions.

In the last column of the table, we compare the percentage wins using a simple relative
sized horizontal bar combined with numeric value percentage wins. By combining charts
and data together in a table, we do away with the need to separately label axes and
legends, or the need to associate the table data with the chart information. In order
to achieve this, CALUMO Visualization Widgets provide in-cell charting for
presentation class dashboards and charts.

With the click of a mouse, CALUMO Visualization Widgets make the design of really
compelling executive dashboards easy.

Business Intelligence and Performance Management Home

Unstructured Information in BI - Only for Spooks? What about Business Analysts?

Posted on the July 29th, 2007. Read times

Source: Cyril on Business Intelligence [link]

There’s a big marketing and consultant push on UIMA, unstructured information management architecture. But I think it is largely missing the point for the real world corporate BI people, i.e. those not spooks or librarians. The critical concept, ignored by many, is that unstructured information is of two kinds, explicit and tacit. Even Wikipedia gets it wrong, ignoring the latter.

I believe BI gains most from tacit intelligence, but that’s not where the product marketing thrust lies. The importance of tacit unstructured information in BI is summed up well by Timo Elliott in a cartoon and by James Taylor’s recent blog post.

The heavy hitters in BI software, as evidenced by the recent takeover activity, e.g. Business Objects and Inxight for one, are pressing home apparent advantage to be gained by corporations with analysis of masses of emails, news, documents, etc. See, for example, the description of UIMA.

Well and good. But you can’t make a silk purse out of a sow’s ear, as Jonathan Swift said. And you can’t create relevant action oriented information for executives out of data that has no embedded useful information in it. The ocean of documents, with some exceptions, is a BI desert for most companies. But I mix my metaphors.

Basically, I believe that a corporation’s vast compendium of historical documents has little BI relevance. It may be useful to track or assess a person’s background, or to isolate the cause of a problem. But history rarely contains the up-to-date information that’s relevant to managing a business, assessing current performance and finding problems. The real lies in exploiting the tacit stuff.

I’ve quoted Henry Minzberg often before, but it bears repeating, as the message hasn’t yet been fully understood in the mainstream of BI: “The strategic database of an organization is in the minds of its managers, not in the databases of its computers”. This is as true today as it was in 1974. Today, one can add: “Or in the morass of historical documents and emails”.

Of course recent emails and documents often contain important information that can, and should, be part of a BI context; but usually only as the seed for a collaborative knowledge building process. This is the nub of the issue; it’s hard to identify, collate, disseminate and collaborate on tacit unstructured information. Perhaps this is why most authors steer clear of the issue. But we need to address it if we are to be effective. More on this in my next post.

I wrote last year detailing some of my research in the 90s on the subject of “hard” and “soft” information, how valuable it is in many BI contexts particularly CRM, but also how difficult it is to exploit. In this context hard information refers to the structured, numeric, formatted, BI reports. Soft information is the unformatted, unarticulated, information in managers’ and professionals’ minds.

An interesting article by Rick Taylor deals with unstructured information is relevant here. It says, in part:

The key to defining knowledge management is to make sure you are separating “explicit” knowledge from “tacit” knowledge. Explicit knowledge is anything easy to quantify, write down, document or explain. Tacit knowledge is everything else. The knowledge based on ones experiences, and often times, at a subconscious level. It is information that you don’t necessarily know you know until you are reminded of it. If you were asked to write down everything you know, could you do it?

The explicit and tacit labels were used first in this context, I believe, by Nonaka and Takeuchi in The Knowledge-Creating Company.

The BI key questions that arise from this discussion are, I believe:

  1. What are the most useful sources of unstructured information in our business? Explicit or Tacit?
  2. If Explicit, how do we best marshal the information and report it?
  3. If Tacit, ditto?
  4. Is the information we get from our unstructured sources complete, and ready for promulgation, or do we need to amplify or build on it before it’s useful?

I believe that the above analysis outlines the problem of utilizing tacit unstructured information reasonably well. I’ll offer my answers to these issues in the next post.

 

Rittman Mead Is Expanding

Posted on the July 29th, 2007. Read times

Source: Mark Rittman's Oracle Weblog [link]

Jon Mead and I have been running Rittman Mead Consulting for just under six months now, and we’ve worked on projects and training assignments with clients in the UK, Europe, Middle-East and North America. We’re at the point now where the amount of new working coming in is steadily increasing, and we’re therefore starting to look out for experienced Oracle business intelligence & data warehousing consultants who’d like to come and work with us.

What we’re ideally looking for is experienced, articulate developers and consultants with a passion for BI and data warehousing, a deep understanding of the Oracle BI&DW technical stack, excellent communication skills and the ability to both lead and work as part of a team on projects in the UK, Europe and farther afield. You’ll probably already be based in the UK, have papers or articles published, have previously worked in consulting and be an active participant in your local Oracle user group. What we can offer is the ability to work as part of a small, focused team dedicated to excellence in Oracle BI & data warehousing, access to exciting and challenging projects, and the chance to get in at the ground floor with a fast-growing Oracle partner.

As well as recruiting new staff, we are also starting an associate consulting program, where a small number of extremely experienced independent Oracle BI & Data Warehousing consultants work along with us on client projects. Recently joined us on this scheme are Paul Shilling, who Jon and I worked alongside on a large data migration project last year and who is a very experienced business analyst, OWB and Discoverer developer; Borkur Steingrimsson, another very experienced OWB, Discoverer and Oracle BI Suite Enterprise Edition consultant and trainer; and Jeff Moss, who you’ll probably know from his blog, his presentations at the UK Oracle User Group and extensive experience in the field of Oracle data warehouse design, development and performance tuning.

We’re pleased to have Jeff, Borkur and Paul along as part of our “virtual team”, and over time we’re looking to bring Rittman Mead Consulting up to a level of around six or so experienced staff supplemented by our associates who work with us from time to time. If you’re interested in working with myself, Jon and the team, or you’re looking for some help with the delivery of an Oracle BI & Data Warehousing development, drop me a line at mark.rittman@rittmanmead.com or give me a call on +44 7866 568246.

Red color for negative numbers in Analysis Services 2005

Posted on the July 28th, 2007. Read times

Source: SQL BI [link]

Sometimes there is a simple way to solve an issue. For example, if you want to color all measures (including calculated measures) in red when the value is negative, you can simply write:

SCOPE( Measures.AllMembers );
    FORE_COLOR(THIS) = IIF( Measures.CurrentMember < 0, 255, 0);  // 255 = RED
END SCOPE;

 

MDX Script is so better than classical properties for each measure in these cases…


Cross-posted from SQLBlog! - http://www.sqlblog.com

MSOLAP Service Crashes Once a Day - Windows Low Fragmentation Heap vs OLAP Heap

Posted on the July 27th, 2007. Read times

Source: Jesse Orosz: Analysis Services Blog [link]

Every night for the last few months the OLAP service would crash during nightly processing. It got so bothersome we eventually opened a support ticket.
 
Turns out we had to start using Windows Low Fragmentation Heap instead of the OLAP Heap. Word has it that the OLAP Heap isn’t an stable as Windows, especially if you have a large OLAP database (we’re 500GB and I guess that’s considered large).
 
We’ve since upgraded to 16GB of RAM on the box and changed the system file cache to 5GB. No problems since.
 
——————————————————————————————————————————————————–

Problem – Analysis Services 2005 generates thousands of mini dumps when you are processing objects overnight.

Environment – Analysis Services 2005 x64, 8GB of RAM

Root Cause – When the server process objects, it requires more memory, and the server is under memory pressure. The server cache cleanup threads will start cleaning up some cache to free up some memory, but it encounter heap corruption issue and generates the mini dumps every time it encounters the corruption. The server system cache is another cause of the memory pressure. In a 64-bit system, the server opens files in the system cache by default. Over time the system cache will take more and more memory if there is no limit on how much it can take, then it leads to server out of memory. Analysis Services needs to open files in order to read and write data during processing, so the part of the system cache is indirectly used by the Analysis Services.

Resolution – To limit the system cache, you need to change the server properties in the msmdsrv.ini file as the following to limit the system cache to 2.5GB:  

<LimitSystemFileCacheSizeMB>2500</LimitSystemFileCacheSizeMB>

<LimitSystemFileCachePeriod>1000</LimitSystemFileCachePeriod>

 

To fix the OLAP heap corruption issue, you will need to use the following settings to let the Analysis Services to use the Windows Low Fragmentation Heap instead of the OLAP heap.

 

MemoryHeapType: set this to 2

HeapTypeForObjects: set this to 0

 

All these settings requires the OLAP service to restart. 

More interview questions - part 2-ish of a sparse series

Posted on the July 27th, 2007. Read times

Source: Pete-s random notes [link]

Candidates that make stuff up or stretch the truth a bit sometimes give the game away, and not always by incorrect answers. And another good question to spot the thinkers

Metashare

Posted on the July 27th, 2007. Read times

Source: Chris Webb's BI Blog [link]

Mark Garner has just dropped me a mail to let me know that Metashare, a tool he’s been working on for generating metadata documentation from SQL 2005 data warehouses and and SSIS packages, has just got to the beta 1 stage. Here’s his blog entry announcing this:

http://mgarner.wordpress.com/2007/07/27/metashare-beta-1/

…and you can find out more about it, and download it, from Sourceforge:

http://metashare.sourceforge.net/

Playing Around with Star Transformations and Bitmap Indexes

Posted on the July 27th, 2007. Read times

Source: Mark Rittman's Oracle Weblog [link]

I’ve been doing some work this week with a client in London, who wants to put together a reporting data mart based on Oracle Database 10gR1. Although they and I were fairly sure a dimensional database design using bitmap indexes and star transformations would give them the query performance they required, there were a number of questions we had at the start that I wanted to get resolved:

  1. How easy would it be to get a star transformation to take place - in earlier versions of Oracle (8i, 9i) star transformations seemed a bit temperamental and I wanted to see how easily these took place in the latest version of the database.
  2. What do the execution plans look like for queries against the star schema with STAR_TRANSFORMATION_ENABLED set to TRUE and FALSE, how do these compare to execution plans for the same table with regular non-unique B*Tree indexes, and did the queries that used star transformations really return data that much faster?
  3. What was the difference in size, and time to create, for bitmap indexes compared to regular non-unique indexes?
  4. If we had a column with a bitmap index on it and we then updated the column (unusual, but this is one of their requirements), by how much did this increase the size of the index and how fast was the update compared to updating a column with a regular non-unique index on it? Also, if we were running a select against the same table, to what extent would the query be slowed down compared to running the same query against a table containing regular non-unique indexes (i.e. what is the impact of row-locking when updating a column with a bitmap index on it)

So, quite a few questions. In addition, the client expected each query against the fact table, which was likely to contain upwards of 20 or 30 million rows when completed, to only return at most a few thousand records, so we were keen to find out whether star transformations still had the same performance edge when returning only a small percentage of the fact table (as opposed to, say, scanning a large chunk of the fact table and then aggregating the data).

The data itself wouldn’t be available for another week or so, so I created a test user and copied across data from the SH Sample Schema. To create a representative set of data, I copied the SH.SALES table into itself a few times to give it around 7m rows, and created two versions of the table, SALES_IDX on which I’d create regular non-unique indexes and SALES_BIX, which would have the bitmap indexes on its dimension key columns. Then, I copied across the SH.CUSTOMERS, SH.PRODUCTS, SH.TIMES, SH.CHANNELS and SH.COUNTRIES tables across, created primary key constraints and unique indexes on their ID columns, bitmap indexes on their attribute columns, bitmap indexes on the dimension key columns in SALES_BIX and non-unique indexes on the corresponding SALES_IDX columns, and then gathered statistics on the schema in readiness for testing.

SQL> desc sales_idx
 Name                          Null?    Type
 ----------------------------- -------- ------------
 PROD_ID                       NOT NULL NUMBER
 CUST_ID                       NOT NULL NUMBER
 TIME_ID                       NOT NULL DATE
 CHANNEL_ID                    NOT NULL NUMBER
 PROMO_ID                      NOT NULL NUMBER
 QUANTITY_SOLD                 NOT NULL NUMBER(10,2)
 AMOUNT_SOLD                   NOT NULL NUMBER(10,2)

SQL> desc sales_bix
 Name                          Null?    Type
 ----------------------------- -------- ------------
 PROD_ID                       NOT NULL NUMBER
 CUST_ID                       NOT NULL NUMBER
 TIME_ID                       NOT NULL DATE
 CHANNEL_ID                    NOT NULL NUMBER
 PROMO_ID                      NOT NULL NUMBER
 QUANTITY_SOLD                 NOT NULL NUMBER(10,2)
 AMOUNT_SOLD                   NOT NULL NUMBER(10,2)

SQL> select table_name
  2  ,      index_name
  3  ,      index_type
  4  from   user_indexes
  5  where  table_name like 'SALES%'
  6  /

TABLE_NAME           INDEX_NAME           INDEX_TYPE
-------------------- -------------------- ------------
SALES_BIX            TIME_ID_BIX          BITMAP
SALES_BIX            CUST_ID_BIX          BITMAP
SALES_BIX            CHANNEL_ID_BIX       BITMAP
SALES_BIX            PROD_ID_BIX          BITMAP
SALES_IDX            PROD_ID_IDX          NORMAL
SALES_IDX            CUST_ID_IDX          NORMAL
SALES_IDX            TIME_ID_IDX          NORMAL
SALES_IDX            CHANNEL_ID_IDX       NORMAL
8 rows selected.

SQL> begin
  2
  3        DBMS_STATS.GATHER_SCHEMA_STATS (
  4          ownname => 'SH_STAR_TEST',
  5            estimate_percent => 10
  6            );
  7            end;
  8  /

PL/SQL procedure successfully completed.

Note that if you’re familiar with the SH schema you’ll have noticed that I’ve missed out on indexing the PROMOTIONS dimension - I forgot about this during the testing but as the schema doesn’t exist in the test schema I set up, and I don’t reference it in joins, it’ll effectively be ignored and treated as if it’s a fact table measure, so we can disregard it.

So now all the data was there, I ran a few select statements to see how the execution plan came out, firstly with STAR_TRANSFORMATION_ENABLED set to FALSE, and then to TRUE, against both the bitmap-indexed and regular-indexed tables. The results were interesting.

SQL> select sum(amount_sold), count(*)
  2  from   sales_bix s, customers c, channels ch,
  3         times t, products p
  4  where  s.prod_id = p.prod_id
  5  and    s.time_id = t.time_id
  6  and    s.channel_id = ch.channel_id
  7  and    s.cust_id = c.cust_id
  8  and    c.cust_state_province = 'CA'
  9  and    t.calendar_year = 1998
 10  and    t.calendar_month_number = 11
 11  and    ch.channel_id = 2
 12  and    p.prod_category_desc = 'Hardware';

SUM(AMOUNT_SOLD)   COUNT(*)
---------------- ----------
        84735.44         56

Elapsed: 00:00:00.39

Execution Plan
----------------------------------------------------------
Plan hash value: 3700521350

----------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                           |     1 |    80 |   698   (1)| 00:00:09 |
|   1 |  SORT AGGREGATE                    |                           |     1 |    80 |            |          |
|*  2 |   HASH JOIN                        |                           |   601 | 48080 |   698   (1)| 00:00:09 |
|   3 |    TABLE ACCESS BY INDEX ROWID     | SALES_BIX                 |    20 |   500 |   546   (1)| 00:00:07
|   4 |     NESTED LOOPS                   |                           |  1181 | 75584 |   546   (1)| 00:00:07 |
|   5 |      MERGE JOIN CARTESIAN          |                           |    60 |  2340 |    27   (4)| 00:00:01 |
|   6 |       NESTED LOOPS                 |                           |     2 |    48 |     3   (0)| 00:00:01 |
|*  7 |        INDEX UNIQUE SCAN           | CHANNELS_PK               |     1 |     3 |     0   (0)| 00:00:01 |
|*  8 |        TABLE ACCESS FULL           | PRODUCTS                  |     2 |    42 |     3   (0)| 00:00:01 |
|   9 |       BUFFER SORT                  |                           |    30 |   450 |    24   (5)| 00:00:01 |
|  10 |        TABLE ACCESS BY INDEX ROWID | TIMES                     |    30 |   450 |    27   (4)| 00:00:01 |
|  11 |         BITMAP CONVERSION TO ROWIDS|                           |       |       |            |          |
|  12 |          BITMAP AND                |                           |       |       |            |          |
|* 13 |           BITMAP INDEX SINGLE VALUE| CALENDAR_MONTH_NUMBER_BIX |       |       |            |          |
|* 14 |           BITMAP INDEX SINGLE VALUE| CALENDAR_YEAR_BIX         |       |       |            |          |
|  15 |      BITMAP CONVERSION TO ROWIDS   |                           |       |       |            |          |
|  16 |       BITMAP AND                   |                           |       |       |            |          |
|* 17 |        BITMAP INDEX SINGLE VALUE   | TIME_ID_BIX               |       |       |            |          |
|* 18 |        BITMAP INDEX SINGLE VALUE   | PROD_ID_BIX               |       |       |            |          |
|* 19 |        BITMAP INDEX SINGLE VALUE   | CHANNEL_ID_BIX            |       |       |            |          |
|* 20 |    VIEW                            | index$_join$_002          |  3506 | 56096 |   151   (3)| 00:00:02 |
|* 21 |     HASH JOIN                      |                           |       |       |            |          |
|  22 |      BITMAP CONVERSION TO ROWIDS   |                           |  3506 | 56096 |     2   (0)| 00:00:01 |
|* 23 |       BITMAP INDEX SINGLE VALUE    | CUST_STATE_PROVINCE_BIX   |       |       |            |
|  24 |      INDEX FAST FULL SCAN          | CUSTOMERS_PK              |  3506 | 56096 |   148   (3)| 00:00:02 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("S"."CUST_ID"="C"."CUST_ID")
   7 - access("CH"."CHANNEL_ID"=2)
   8 - filter("P"."PROD_CATEGORY_DESC"='Hardware')
  13 - access("T"."CALENDAR_MONTH_NUMBER"=11)
  14 - access("T"."CALENDAR_YEAR"=1998)
  17 - access("S"."TIME_ID"="T"."TIME_ID")
  18 - access("S"."PROD_ID"="P"."PROD_ID")
  19 - access("S"."CHANNEL_ID"=2)
  20 - filter("C"."CUST_STATE_PROVINCE"='CA')
  21 - access(ROWID=ROWID)
  23 - access("C"."CUST_STATE_PROVINCE"='CA')

That’s as expected, the query uses the bitmap indexes, does a number of hash joins and nested loop joins and returns the data in 0.39 seconds (not bad for 7m fact table rows) with a cost of 698. Running the query against the fact table with only regular non-unique indexes returns the following autotrace output:

SQL> select sum(amount_sold), count(*)
  2  from   sales_idx s, customers c, channels ch,
  3         times t, products p
  4  where  s.prod_id = p.prod_id
  5  and    s.time_id = t.time_id
  6  and    s.channel_id = ch.channel_id
  7  and    s.cust_id = c.cust_id
  8  and    c.cust_state_province = 'CA'
  9  and    t.calendar_year = 1998
 10  and    t.calendar_month_number = 11
 11  and    ch.channel_id = 2
 12  and    p.prod_category_desc = 'Hardware';

SUM(AMOUNT_SOLD)   COUNT(*)
---------------- ----------
        84735.44         56

Elapsed: 00:00:20.96

Execution Plan
----------------------------------------------------------
Plan hash value: 2003962144

----------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                           |     1 |    80 |  3175   (1)| 00:00:39 |
|   1 |  SORT AGGREGATE                 |                           |     1 |    80 |            |          |
|*  2 |   HASH JOIN                     |                           |   598 | 47840 |  3175   (1)| 00:00:39 |
|*  3 |    HASH JOIN                    |                           |  1176 | 75264 |  3023   (1)| 00:00:37 |
|   4 |     TABLE ACCESS BY INDEX ROWID | TIMES                     |    30 |   450 |    11   (0)| 00:00:01 |
|   5 |      BITMAP CONVERSION TO ROWIDS|                           |       |       |            |          |
|   6 |       BITMAP AND                |                           |       |       |            |          |
|*  7 |        BITMAP INDEX SINGLE VALUE| CALENDAR_MONTH_NUMBER_BIX |       |       |            |          |
|*  8 |        BITMAP INDEX SINGLE VALUE| CALENDAR_YEAR_BIX         |       |       |            |          |
|*  9 |     TABLE ACCESS BY INDEX ROWID | SALES_IDX                 | 28620 |   698K|  1504   (1)| 00:00:19 |
|  10 |      NESTED LOOPS               |                           | 57239 |  2738K|  3011   (1)| 00:00:37 |
|  11 |       NESTED LOOPS              |                           |     2 |    48 |     3   (0)| 00:00:01 |
|* 12 |        INDEX UNIQUE SCAN        | CHANNELS_PK               |     1 |     3 |     0   (0)| 00:00:01 |
|* 13 |        TABLE ACCESS FULL        | PRODUCTS                  |     2 |    42 |     3   (0)| 00:00:01 |
|* 14 |       INDEX RANGE SCAN          | PROD_ID_IDX               |   102K|       |   214   (2)| 00:00:03 |
|* 15 |    VIEW                         | index$_join$_002          |  3506 | 56096 |   151   (3)| 00:00:02 |
|* 16 |     HASH JOIN                   |                           |       |       |            |          |
|  17 |      BITMAP CONVERSION TO ROWIDS|                           |  3506 | 56096 |     2   (0)| 00:00:01 |
|* 18 |       BITMAP INDEX SINGLE VALUE | CUST_STATE_PROVINCE_BIX   |       |       |            |          |
|  19 |      INDEX FAST FULL SCAN       | CUSTOMERS_PK              |  3506 | 56096 |   148   (3)| 00:00:02 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("S"."CUST_ID"="C"."CUST_ID")
   3 - access("S"."TIME_ID"="T"."TIME_ID")
   7 - access("T"."CALENDAR_MONTH_NUMBER"=11)
   8 - access("T"."CALENDAR_YEAR"=1998)
   9 - filter("S"."CHANNEL_ID"=2)
  12 - access("CH"."CHANNEL_ID"=2)
  13 - filter("P"."PROD_CATEGORY_DESC"='Hardware')
  14 - access("S"."PROD_ID"="P"."PROD_ID")
  15 - filter("C"."CUST_STATE_PROVINCE"='CA')
  16 - access(ROWID=ROWID)
  18 - access("C"."CUST_STATE_PROVINCE"='CA')

OK, so this time the query took around 20 seconds with a cost of 3175, about ten times the cost of the query using bitmap indexes, around what I expected.

Now I enabled star transformations and ran the same queries again. Firstly, the table with bitmap indexes on it:

SQL> alter session set star_transformation_enabled = true;

Session altered.

Elapsed: 00:00:00.00
SQL> select sum(amount_sold), count(*)
  2  from   sales_bix s, customers c, channels ch,
  3         times t, products p
  4  where  s.prod_id = p.prod_id
  5  and    s.time_id = t.time_id
  6  and    s.channel_id = ch.channel_id
  7  and    s.cust_id = c.cust_id
  8  and    c.cust_state_province = 'CA'
  9  and    t.calendar_year = 1998
 10  and    t.calendar_month_number = 11
 11  and    ch.channel_id = 2
 12  and    p.prod_category_desc = 'Hardware';

SUM(AMOUNT_SOLD)   COUNT(*)
---------------- ----------
        84735.44         56

Elapsed: 00:00:01.08

Execution Plan
----------------------------------------------------------
Plan hash value: 3776350793

----------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                           |     1 |    31 |   201   (5)| 00:00:03 |
|   1 |  SORT AGGREGATE                      |                           |     1 |    31 |            |          |
|   2 |   NESTED LOOPS                       |                           |    60 |  1860 |   201   (5)| 00:00:03 |
|*  3 |    INDEX UNIQUE SCAN                 | CHANNELS_PK               |     1 |     3 |     0   (0)| 00:00:01 |
|   4 |    TABLE ACCESS BY INDEX ROWID       | SALES_BIX                 |    60 |  1680 |   201   (5)| 00:00:03 |
|   5 |     BITMAP CONVERSION TO ROWIDS      |                           |       |       |            |          |
|   6 |      BITMAP AND                      |                           |       |       |            |          |
|   7 |       BITMAP MERGE                   |                           |       |       |            |          |
|   8 |        BITMAP KEY ITERATION          |                           |       |       |            |          |
|*  9 |         TABLE ACCESS FULL            | PRODUCTS                  |     2 |    42 |     3   (0)| 00:00:01 |
|* 10 |         BITMAP INDEX RANGE SCAN      | PROD_ID_BIX               |       |       |            |          |
|  11 |       BITMAP MERGE                   |                           |       |       |            |          |
|  12 |        BITMAP KEY ITERATION          |                           |       |       |            |          |
|  13 |         TABLE ACCESS BY INDEX ROWID  | TIMES                     |    30 |   450 |    11   (0)| 00:00:01 |
|  14 |          BITMAP CONVERSION TO ROWIDS |                           |       |       |            |          |
|  15 |           BITMAP AND                 |                           |       |       |            |          |
|* 16 |            BITMAP INDEX SINGLE VALUE | CALENDAR_MONTH_NUMBER_BIX |       |       |            |          |
|* 17 |            BITMAP INDEX SINGLE VALUE | CALENDAR_YEAR_BIX         |       |       |            |          |
|* 18 |         BITMAP INDEX RANGE SCAN      | TIME_ID_BIX               |       |       |            |          |
|* 19 |       BITMAP INDEX SINGLE VALUE      | CHANNEL_ID_BIX            |       |       |            |          |
|  20 |       BITMAP MERGE                   |                           |       |       |            |          |
|  21 |        BITMAP KEY ITERATION          |                           |       |       |            |          |
|* 22 |         VIEW                         | index$_join$_017          |  3506 | 56096 |   151   (3)| 00:00:02 |
|* 23 |          HASH JOIN                   |                           |       |       |            |          |
|  24 |           BITMAP CONVERSION TO ROWIDS|                           |  3506 | 56096 |     2   (0)| 00:00:01 |
|* 25 |            BITMAP INDEX SINGLE VALUE | CUST_STATE_PROVINCE_BIX   |       |       |            |          |
|  26 |           INDEX FAST FULL SCAN       | CUSTOMERS_PK              |  3506 | 56096 |   148   (3)| 00:00:02 |
|* 27 |         BITMAP INDEX RANGE SCAN      | CUST_ID_BIX               |       |       |            |          |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("CH"."CHANNEL_ID"=2)
   9 - filter("P"."PROD_CATEGORY_DESC"='Hardware')
  10 - access("S"."PROD_ID"="P"."PROD_ID")
  16 - access("T"."CALENDAR_MONTH_NUMBER"=11)
  17 - access("T"."CALENDAR_YEAR"=1998)
  18 - access("S"."TIME_ID"="T"."TIME_ID")
  19 - access("S"."CHANNEL_ID"=2)
  22 - filter("C"."CUST_STATE_PROVINCE"='CA')
  23 - access(ROWID=ROWID)
  25 - access("C"."CUST_STATE_PROVINCE"='CA')
  27 - access("S"."CUST_ID"="C"."CUST_ID")

Note
-----
   - star transformation used for this statement

Ok, so the execution time is slightly higher but the cost is lower, 201 to 698. Given the small data set size and the unscientific conditions, I wasn’t too surprised about the slight time discrepency, but the execution plan cost was lower, which is what I expected. I’ll run some more tests later and check that the star transform does infact return the data faster as well as generate a lower-cost plan. Note also the Note in the autotrace output, “– Star transformation used for this statement”.

Now I run the same test on the table with regular non-unique indexes.

SQL> select sum(amount_sold), count(*)
  2  from   sales_idx s, customers c, channels ch,
  3         times t, products p
  4  where  s.prod_id = p.prod_id
  5  and    s.time_id = t.time_id
  6  and    s.channel_id = ch.channel_id
  7  and    s.cust_id = c.cust_id
  8  and    c.cust_state_province = 'CA'
  9  and    t.calendar_year = 1998
 10  and    t.calendar_month_number = 11
 11  and    ch.channel_id = 2
 12  and    p.prod_category_desc = 'Hardware';

SUM(AMOUNT_SOLD)   COUNT(*)
---------------- ----------
        84735.44         56

Elapsed: 00:00:01.88

Execution Plan
----------------------------------------------------------
Plan hash value: 747039580

----------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                           |     1 |    31 |  3625   (7)| 00:00:44 |
|   1 |  SORT AGGREGATE                      |                           |     1 |    31 |            |          |
|   2 |   NESTED LOOPS                       |                           |    60 |  1860 |  3625   (7)| 00:00:44 |
|*  3 |    INDEX UNIQUE SCAN                 | CHANNELS_PK               |     1 |     3 |     0   (0)| 00:00:01 |
|*  4 |    TABLE ACCESS BY INDEX ROWID       | SALES_IDX                 |    60 |  1680 |  3625   (7)| 00:00:44 |
|   5 |     BITMAP CONVERSION TO ROWIDS      |                           |       |       |            |          |
|   6 |      BITMAP AND                      |                           |       |       |            |          |
|   7 |       BITMAP MERGE                   |                           |       |       |            |          |
|   8 |        BITMAP KEY ITERATION          |                           |       |       |            |          |
|   9 |         TABLE ACCESS BY INDEX ROWID  | TIMES                     |    30 |   450 |    11   (0)| 00:00:01 |
|  10 |          BITMAP CONVERSION TO ROWIDS |                           |       |       |            |          |
|  11 |           BITMAP AND                 |                           |       |       |            |          |
|* 12 |            BITMAP INDEX SINGLE VALUE | CALENDAR_MONTH_NUMBER_BIX |       |       |            |          |
|* 13 |            BITMAP INDEX SINGLE VALUE | CALENDAR_YEAR_BIX         |       |       |            |          |
|  14 |         BITMAP CONVERSION FROM ROWIDS|                           |       |       |            |          |
|* 15 |          INDEX RANGE SCAN            | TIME_ID_IDX               |       |       |    14   (0)| 00:00:01 |
|  16 |       BITMAP MERGE                   |                           |       |       |            |          |
|  17 |        BITMAP KEY ITERATION          |                           |       |       |            |          |
|* 18 |         TABLE ACCESS FULL            | PRODUCTS                  |     2 |    42 |     3   (0)| 00:00:01 |
|  19 |         BITMAP CONVERSION FROM ROWIDS|                           |       |       |            |          |
|* 20 |          INDEX RANGE SCAN            | PROD_ID_IDX               |       |       |   215   (2)| 00:00:03 |
|  21 |       BITMAP MERGE                   |                           |       |       |            |          |
|  22 |        BITMAP KEY ITERATION          |                           |       |       |            |          |
|* 23 |         VIEW                         | index$_join$_017          |  3506 | 56096 |   151   (3)| 00:00:02 |
|* 24 |          HASH JOIN                   |                           |       |       |            |          |
|  25 |           BITMAP CONVERSION TO ROWIDS|                           |  3506 | 56096 |     2   (0)| 00:00:01 |
|* 26 |            BITMAP INDEX SINGLE VALUE | CUST_STATE_PROVINCE_BIX   |       |       |            |          |
|  27 |           INDEX FAST FULL SCAN       | CUSTOMERS_PK              |  3506 | 56096 |   148   (3)| 00:00:02 |
|  28 |         BITMAP CONVERSION FROM ROWIDS|                           |       |       |            |          |
|* 29 |          INDEX RANGE SCAN            | CUST_ID_IDX               |       |       |     4   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("CH"."CHANNEL_ID"=2)
   4 - filter("S"."CHANNEL_ID"=2)
  12 - access("T"."CALENDAR_MONTH_NUMBER"=11)
  13 - access("T"."CALENDAR_YEAR"=1998)
  15 - access("S"."TIME_ID"="T"."TIME_ID")
  18 - filter("P"."PROD_CATEGORY_DESC"='Hardware')
  20 - access("S"."PROD_ID"="P"."PROD_ID")
  23 - filter("C"."CUST_STATE_PROVINCE"='CA')
  24 - access(ROWID=ROWID)
  26 - access("C"."CUST_STATE_PROVINCE"='CA')
  29 - access("S"."CUST_ID"="C"."CUST_ID")

Note
-----
   - star transformation used for this statement

Now that’s interesting. The data comes back faster than when we had star transformations disabled, the cost is slightly higher, and the autotrace output reports that a star transformation was in fact used on this query, even though the fact table didn’t have bitmap indexes on its dimension columns. Looking at the execution plan (and I’m certainly no expert on interpreting explain plans here, so there’s a good chance I’ve got the wrong end of the stick here) but it looks like the optimizer is creating on-the-fly bitmap indexes (”BITMAP CONVERSION FROM ROWIDS”) and then scanning these bitmaps as part of a star transformation (”BITMAP CONVERSION TO ROWIDS”). Contrast this with the bitmap-indexed tables’ execution plan, where only the “BITMAP CONVERSION TO ROWIDS” step takes place. I could well be wrong here (I suspect I am) but if anyone knows any more about this - I’m on 10g 10.2.0.3 Enterprise Edition on Windows XP SP2 - let me know as I’m definitely interested to know what’s going on.

So, it looks like at least in terms of generating the most efficient execution plan, a fact table with bitmap indexes on the dimension key columns looks like being the optimal solution, although this apparent “on-the-fly” creation of bitmap indexes from regular B*Tree indexes looks interesting. So what about the question about index size?

SQL> select segment_name
  2  ,      bytes/1024000 "Size in MB"
  3  from   user_segments
  4  where  segment_name like 'CUST_ID%'
  5  or     segment_name like 'PROD_ID%'
  6  or     segment_name like 'CHANNEL_ID%'
  7  or     segment_name like 'TIME_ID%'
  8  /

SEGMENT_NAME         Size in MB
-------------------- ----------
CHANNEL_ID_BIX            3.072
CHANNEL_ID_IDX           122.88
CUST_ID_BIX              27.648
CUST_ID_IDX             131.072
PROD_ID_BIX               2.048
PROD_ID_IDX              122.88
TIME_ID_BIX               4.096
TIME_ID_IDX              163.84

8 rows selected.

So the normal non-unique indexes are currently over 100MB in size each, whilst the bitmap indexes (which are highly compressed) are below 5MB for the small dimensions, and around 27MB for the large, 55000-row dimension. Now we’ll see how these are affected by inserts and updates to the tables.

SQL> insert into sales_bix
  2  select *
  3  from   sh.sales
  4  /

918843 rows created.

Elapsed: 00:00:56.67
SQL> commit
  2  /

Commit complete.

Elapsed: 00:00:00.39
SQL> insert into sales_idx
  2  select *
  3  from   sh.sales
  4  /

918843 rows created.

Elapsed: 00:09:55.17
SQL> commit;

Commit complete.

Elapsed: 00:00:00.02
SQL>
SQL> select segment_name
  2  ,      bytes/1024000 "Size in MB"
  3  from   user_segments
  4  where  segment_name like 'CUST_ID%'
  5  or     segment_name like 'PROD_ID%'
  6  or     segment_name like 'CHANNEL_ID%'
  7  or     segment_name like 'TIME_ID%'
  8  /

SEGMENT_NAME         Size in MB
-------------------- ----------
CHANNEL_ID_BIX            3.072
CHANNEL_ID_IDX          139.264
CUST_ID_BIX              27.648
CUST_ID_IDX             180.224
PROD_ID_BIX               2.048
PROD_ID_IDX             147.456
TIME_ID_BIX               4.096
TIME_ID_IDX             196.608

8 rows selected.

Ok, so the table with bitmapped indexes took just under a minute to insert rows in to (1m rows into a table of about 7m rows) and the indexes stayed the same size, whilst the corresponding insert into the table with regular indexes took around 9 minutes and increased the index size by between 10% and 40%. Note that with this insert, no new dimension key values were used, it was the same set of data being inserted back into the test tables.

And now the same for updates. I’ll do updates to the CUST_ID column as this has the largest amount of key values (55000-odd). Here are the results:

SQL> update sales_bix
  2  set    cust_id = 3001
  3  where  time_id between to_date('01-jan-2000','DD-MON-YYYY')
  4         and to_date('30-apr-2000','DD-MON-YYYY')
  5  and    channel_id =3
  6  /

461151 rows updated.

Elapsed: 00:00:45.51
SQL> commit;

Commit complete.

Elapsed: 00:00:00.01
SQL> update sales_idx
  2  set    cust_id = 3000
  3  where  time_id between to_date('01-jan-1999','DD-MON-YYYY')
  4         and to_date('30-apr-1999','DD-MON-YYYY')
  5  and    channel_id = 2
  6  /

186120 rows updated.

Elapsed: 00:03:01.97
SQL> update sales_idx
  2  set    cust_id = 3001
  3  where  time_id between to_date('01-jan-2000','DD-MON-YYYY')
  4         and to_date('30-apr-2000','DD-MON-YYYY')
  5  and    channel_id =3
  6  /

461151 rows updated.

Elapsed: 00:06:18.26
SQL> commit;

Commit complete.

Elapsed: 00:00:00.05
SQL> select segment_name
  2  ,      bytes/1024000 "Size in MB"
  3  from   user_segments
  4  where  segment_name like 'CUST_ID%'
  5  or     segment_name like 'PROD_ID%'
  6  or     segment_name like 'CHANNEL_ID%'
  7  or     segment_name like 'TIME_ID%'
  8  /

SEGMENT_NAME         Size in MB
-------------------- ----------
CHANNEL_ID_BIX            3.072
CHANNEL_ID_IDX          139.264
CUST_ID_BIX              28.672
CUST_ID_IDX             196.608
PROD_ID_BIX               2.048
PROD_ID_IDX             147.456
TIME_ID_BIX               4.096
TIME_ID_IDX             196.608

8 rows selected.

Ok, so the CUST_ID_IDX index has got bigger, but the bitmap indexes have stayed the same. What if I create another 5,000 customers, create another 100,000 sales records using these new customers and add them two the two fact tables?

SQL> insert into sales_bix
  2  select *
  3  from   new_sales
  4  /

100000 rows created.

Elapsed: 00:00:03.78
SQL> insert into sales_idx
  2  select *
  3  from   new_sales
  4  /

100000 rows created.

Elapsed: 00:00:33.18
SQL> commit;

Commit complete.

Elapsed: 00:00:00.01
SQL> select segment_name
  2  ,      bytes/1024000 "Size in MB"
  3  from   user_segments
  4  where  segment_name like 'CUST_ID%'
  5  or     segment_name like 'PROD_ID%'
  6  or     segment_name like 'CHANNEL_ID%'
  7  or     segment_name like 'TIME_ID%'
  8  /

SEGMENT_NAME              Size in MB
------------------------- ----------
CHANNEL_ID_BIX                 3.072
CHANNEL_ID_IDX               147.456
CUST_ID_BIX                   28.672
CUST_ID_IDX                  196.608
PROD_ID_BIX                    2.048
PROD_ID_IDX                  147.456
TIME_ID_BIX                    4.096
TIME_ID_IDX                  196.608

8 rows selected.

So, no difference then, and the bitmap-indexed table is still faster to insert data in to, compared to a regularly-indexed table.

What I’ve found then is that, with my data set and my laptop, there’s no noticeable degradation when maintaining a fact table with bitmap indexes on the fact table columns, and in fact activities on this table are faster than the same activities on a table with B*Tree indexes. Now I’m of course open to the idea that my data set is skewed, or my environment is unrepresentative, but I’m surprised nonetheless as I expected all hell to break loose once I tried updating and inserting into column with bitmap indexes on them, and that’s not what I got in my testing. Maybe my fact table is too small (7m rows) or my dimensions are too small (two are <100 rows, one is 1.5k and one is 55k rows), but there was a clear performance difference between the table with regular indexes and bitmap indexes, so we can’t put it down to such a small data set that all activities ran in just a second or so. Strange.

Also, the star transformation taking place on the table with regular, non-unique indexes was strange as well. I had a quick email exchange with Peter Scott and he mentioned that DB/2, for example, also creates bitmap indexes on the fly, so this could be something that was introduced in 10gR1 or R2 and I didn’t notice at the time. It was an interesting exercise though as I thought firstly that bitmap indexes being updated would be a performance killer, and it wasn’t, and I was pleasantly surprised when the star transformation took place so easily (and in fact appears to have taken place even when I didn’t have the neccessary bitmap indexes in place).

We’ll be running all this on the real data set in a couple of weeks, including trying to update table columns with bitmap indexes on them, so it’ll be interesting to see whether this test data set ends up being representative of the real-world data; also, if the bitmap indexes revert to type (which I suspect they will) it’ll be interesting to see where I went wrong in my testing. I’ve you’ve spotted anything already (or indeed if this is in fact the correct behavior, especially the star transformation bit) let me know as I’m keen to get these unresolved bits sorted out. Also, I’m conscious that my testing is on 10g 10.2.0.3 whilst the client is on 10.1.0.5, so if this is a 10gR2-specific behavior, again let me know.

MetaShare Beta 1

Posted on the July 27th, 2007. Read times

Source: Mark Garner's Business Intelligence Blog [link]

MetaShare Beta 1 is now available at http://metashare.sourceforge.net.  This release now includes the TIFL document architecture that allows for construction of various target drivers.  The two target drivers that I’ve written so far are FlexWiki and HTML.  The TIFL format is very close to HTML so adding HTML wasn’t too hard.  There is a new command line switch to choose which output is wanted.

I’d like to consider writing a driver for a couple more targets including SharePoint wiki as well as a product called Perspective wiki.   This is a feature request that I’ll add and we’ll see where it goes.

I had thought about implementing SQL authentication for the suite of tools instead of using integrated authentication, but I’d like to give it a go at using integrated authentication for now.  I like having built in security for the suite so that not just anyone can log on to the website and update/mess up the internal metadata.  Hopefully someone will give some feedback in either a comment here on the blog or a feature/bug request on the SourceForge site.

MetaShare is still a beta product and I’d really like some feedback on how it is working for everyone.  I’d really like to take feedback into account before christening it version 1 and having an “official” release.  Let me know any suggestions or problems.  Hopefully we can make this a value add to the Microsoft BI stack.

¿Buscando las mejores herramientas 2.0 para tu empresa?

Posted on the July 26th, 2007. Read times

Source: Todo BI: Business Intelligence, Data Warehouse, CRM y mucho mas... [link]

Pues aquí las tenemos…

ITRedux

Una recopilación muy interesante de la mano de ITRedux, que seguro que se irá actualizando y mejorando, pues no dejan de aparecer nuevas novedades interesantes en el mercado.

Este es el enlace que no os debéis perder: http://itredux.com/office-20/database/

Además, nos da una serie de datos relativas a las mismas muy intesantes. Si tienen blog, comparativa de características, etc…
Una gran forma de pode manejar tu negocio via web, y con unos costes muy razonables. Definitivamente, la forma en que se organizan las empresas de hoy en día estan cambiando.

Readers Berating CIO Magazine for Layout Equivalent of Spamming

Posted on the July 26th, 2007. Read times

Source: Clickstream [link]

I had more fun reading the comments on this CIO Magazine article than the article itself. In fact, I didn’t finish the article because it was the thing that pissed all the readers off.

Lesson to learn: sure, you may be trying to maximize ad revenue, but dropping the ad to content ratio to 1 ad per 100 words is taking it too far. They have 19 (!) pages of article, one paragraph per page. The side effects are damn funny, though. The comment paragrpahs count runs 5:1 over the article paragraph count.

Web Data Sourcing Tools We Used for the Mashup Contest

Posted on the July 26th, 2007. Read times

Source: Clickstream [link]

We just finished recording a podcast for IBM DeveloperWorks which will be up in the next few days so I was looking to see what else happened at Mashup Camp while we were writing code. We got a mention at Programmable Web (first place I go to see what’s new in the mashup space). I wish we had been able to spend more time in sessions, but being heads-down in new tools was still worthwhile.

Apart from the possibility of winning prizes, this is the best learning environment I’ve found for this space. Unless you spend a lot of time reading through blogs, you aren’t going to find many resources. Besides, nothing beats learning from other people while doing.

Here’s the rundown of tools Renat and I worked with:
QEDwiki (not an official product or downloadable, yet)
Apatar
Dapper
Kapow
Yahoo! Pipes
and lots of Google sources

One thing that hasn’t been mentioned in most of the news is that all these companies had people at Camp. To be honest, if it weren’t for Dan Gisolfi and Meg Sorber from IBM, we never would have finished work by the time the event closed. They stayed up late to help us with problems, bugs and techniques using QEDwiki.

I used Dapper a lot to scrape pages and make RSS feeds. I ran into problems with Dapper and the bad HTML practices of some web sites. Fortunately, Eran Shir and Jon Aizen (the CEO and CTO of Dapper) were there to help out.

Out of all the things I’ve worked with, theirs is the most impressive because of its simplicity. Unlike Pipes, which manipulates RSS feeds, Dapper scrapes pages and turns them into feeds in many different formats. Dapper + Pipes is a great combination. Kapow is an industrial strength scraper, so Dapper is not as powerful as Kapow’s tools, but it’s a lot easier to use for something quick and dirty.

We used Apatar because it was the only way short of directly coding to APIs to get data from Salesforce.com. And it’s open source. And Renat knows how to use it. It’s not a page scraper, it’s a data integration tool, so it does things these other tools can’t. Overall, the combination of a DI tool, a scraper, and a manipulation and delivery formatting tool are what you need to get data for mashups if you’re doing it inside an IT shop.

QEDwiki is the assembly hub, so it doesn’t provide data sourcing or manipulation features. IBM is going to include another tool in the kit for that. They did a demo of this during Mashup U, but didn’t have it available for us.

RSS Sucks For Mashups, But It’s Ubiquitous

Posted on the July 26th, 2007. Read times

Source: Clickstream [link]

Another thing we learned while banging away at web sites to get data was that RSS stinks for slinging web data, yet the tools available don’t deal well with Atom feeds. Atom is great for this purpose, but hard to locate tools for (other than code libraries). Being both ignorant and lazy, I’d prefer to integrate tools rather than write 3x that much code in PHP and Python.

Our first attempt to grab data from Salesforce.com generated an Atom feed. We did that because we wanted to get a bunch of data and feed different elements to other extractors to collect data. Sadly, none of the tools we were using could consume Atom. We had a choice of programming a custom widget for parsing Atom to get the data we wanted or switching to RSS. We did the latter because it was expedient given the time constraints.

The problem with using RSS for data is that it only has a few fields to store data in. We ended up using the title, description, link, and even the date fields. This meant having to take 8 fields of extracted data and publish two separate RSS feeds, or stuffing multiple values into a single field like description. Luckily, we didn’t need some fields broken out so stuffed things into fewer fields. Still, I had something like nine RSS feeds going at one point.

The other side was getting feeds. Lots of sites generate RSS feeds, but no Atom. So we’re stuck with crappy feeds or we do it ourselves, but then have to DIY the consuming side as well. Let’s hope Atom usage picks up in the future.

Hiding an Entire Dimension Using Role Based Security

Posted on the July 26th, 2007. Read times

Source: Jesse Orosz: Analysis Services Blog [link]

It was a nice three weeks in Peru but I’m glad to be back in the land of warm showers, toilets with seats, and cold drinks. I’ll miss the $4 filet mignon’s though.
 
Anyway, while I was gone I had a request come in about setting up security for a single user (we’re a very liberal company; we give all our employees access to all data!). The user needed access to a cube, but only have access to five of ten dimensions within it. Easy enough I thought. Finally a chance for me to screw around with security.
 
Got to working on it. Role based security - simple. Went to the Dimensions option in Edit Role, looked at the drop down for Access…yet didn’t see ‘None’. HUH? Only Read and Read/Write. Messed around and couldn’t find an acceptable solution so I posted something to the newsgroup.
 
Deepak responded, informed me I wasn’t the only person with this question, and pointed to a thread on MSDN.
 

Christopher filed an issue with Connect, however it was closed with a status of ‘Won’t Fix’. Ugh. If you can do it on the cube level, how hard would it be to do the same on the dimensional level?
 
My solution was to create a new cube with linked measure groups and dimensions. Yea, it’s probably about the worst solution possible, but the user needed access within 24 hours.
 
Totally frustrating.

Good Fortune

Posted on the July 25th, 2007. Read times

Source: Pete-s random notes [link]

Nothing of interest to the technologist here, so best not to read it

Next Page »