BI Blogs

Bringing together Business Intelligence voices from across the web

The Oncoming Microsoft ASP.NET MVC with Some LINQ Tossed In - Part 2

Posted on the January 31st, 2008. Read times

Source: Loosely Coupled Human Code Factory [link]

In the first part of this series I went over how to create a basic skeleton of controllers and models so that we could create a mock test and get it green lighted along with some other tests for getting the basic navigation methods tested and green lighted.  In this part of the series I’m going to go over the model from a LINQ point of view.  After that in subsequent parts of this series I’ll start building out the views, utilizing of course a test driven development process.  So no…(read more)

Open Reports: Otra alternativa Open Source para Reporting

Posted on the January 31st, 2008. Read times

Source: Todo BI: Business Intelligence, Data Warehouse, CRM y mucho mas... [link]

Open Reports

Cada vez tenemos mas alternativas para realizar nuestros informes y soluciones Business Intelligence con herramientas de Software Libre.
Hace poco hacíamos un documento en exclusiva que comparaba las principales soluciones de reporting OS: Pentaho, Jasper y BIRT.

Hoy, os añadimos Open Reports que tiene algunas novedades interesantes. Aunque no es tan potente ni completo como los anteriores, maneja en su version PRO elementos interesantes (como la inclusión de consultas OLAP via Mondrian y Jpivot) y en la Open Source un paquete muy rápido y sencillo de usar.
Se trata de un entorno web que soporta los sistemas de reporting mencionados anteriormente.

Caracteristicas:
- Support for a wide variety of export formats including PDF, HTML, CSV, XLS, RTF, and Image.
- Web based Administration of Users, Groups, Reports, Parameters, and DataSources.
- Flexible Scheduling including Hourly, Daily, Weekly, Monthly and Cron scheduling and multiple recipients.
- Comprehensive Report Parameter support including Date, Text, List, Query, and Boolean parameters.
- Fine-grained security controls access to Reports, Scheduling, and Administration functionality.
- Report Auditing tracks start time, duration, status, and user of every report generated.
- Support for multiple JNDI or Connection Pool DataSources for use in generating reports.
- Support for Drill Down reports and external application integration via secure report generation URL.


Podreis seguir las novedades mas actualizadas en su blog.
No olvdeis echar un ojo al AJAX Report Viewer, una demo muy interesante con las posibilidades de Open Reports.

or-report-viewer

Building ETL processes, Childsplay ?!?

Posted on the January 31st, 2008. Read times

Source: Dutch Business Intelligence Blog [link]

In 1996 Building ETL Processes is ChildsplayI started building ETL processes in the SAS System for a customer. This Customer needed a metadata driven approach for their Data Warehouse solution.

We build a nice collection of datasets / tables which contained the metadata about the source, the target and the transformations. This was my very first encounter with metadata drive ETL processes. There was no tool at that time like Informatica PowerCenter, Microsoft SSAS and SAS Data Integrator. All was built in pure macro code which accessed the metadata and created code on the fly.


e=”text/javascript”
src=”http://pagead2.googlesyndication.com/pagead/show_ads.js”>

Now 12 years later, (more…)

Funny Architectural Wisdom

Posted on the January 31st, 2008. Read times

Source: Dutch Business Intelligence Blog [link]

Today I read the BLOG from Edwin Zwart, a colleague of mine at Capgemini The Netherlands. He started a BLOG last december and uses a lot of YOUTUBE visualization in his posts. I must admit, this is a real piece of out-of-the-box thinking and working (his post).

=”http://pagead2.googlesyndication.com/pagead/show_ads.js” type=”text/javascript”>
I haven’t read his whole BLOG, but I certainly like to share one of his posts with you, my readers. (more…)

Life explained

Posted on the January 31st, 2008. Read times

Source: Dutch Business Intelligence Blog [link]

On the first day, God created the dog and said:

"Sit all day by the door of your house and bark at anyone
Who comes in or walks past. For this, I will give you a life
span of twenty years."

The dog said: "That’s a long time to be barking. How about
only ten years and I’ll give you back the other ten?"

So God agreed.


e=”text/javascript”
src=”http://pagead2.googlesyndication.com/pagead/show_ads.js”>

On the second day, God created the monkey and said:

(more…)

Walter Smetsers is Certified Data Vault Expert

Posted on the January 31st, 2008. Read times

Source: Dutch Business Intelligence Blog [link]

Today I got a message from Dan Linstedt, stating that I have passed the Data Vault Exam.

Original phrase from the mail;

ble>

So as off today I am allowed to add certified DATA VAULT EXPERT to my LinkedIn Profile other signatures. Now let’s hope I will be able to make this wave happen in Europe together with my Data Vault Expert colleagues. We will start with a workshop within my company were we are trying to make a DV model from a sample database of one of the famous database systems (M$)


e=”text/javascript”
src=”http://pagead2.googlesyndication.com/pagead/show_ads.js”>


 

 

Hi Walter,

 

 

p>

 

 

Congratulations, you are certified. ***

Thank-you,

Dan Linstedt

A Wise Happy X-mas from The Netherlands

Posted on the January 31st, 2008. Read times

Source: Dutch Business Intelligence Blog [link]

Happy X-Mas from the Netherlands

He knew how to handle the situation.

 


e=”text/javascript”
src=”http://pagead2.googlesyndication.com/pagead/show_ads.js”>

Consolidation in BI Consultancy

Posted on the January 31st, 2008. Read times

Source: Dutch Business Intelligence Blog [link]

In 2007 we have seen a lot of consolidating going on in the Business Intelligence tools, but we did we see this one coming also ?!?!?

I got a message from a colleague of mine, stating that there was a rumour about Capgemini been taken over by a big Indian company called Wipro.


e=”text/javascript”
src=”http://pagead2.googlesyndication.com/pagead/show_ads.js”>

Googling on WIPRO and Capgemini gave me the next results; (more…)

Thoughts about Record Source in Data Vault Modelling

Posted on the January 31st, 2008. Read times

Source: Dutch Business Intelligence Blog [link]

Today I was at the BDWM course at my customer. The teacher explained to us how we should use the SOURCE SYSTEM COMPONENT within the IBM Banking Data Warehouse Model (BDWM). The resamblence with the Record Source field in a Data Vault Datamodel stroke me.

Both are used, in essence, to provide information about which source system was responsible for delivering the information / data, but only one of them has a 3NF which is right in my (humble) opinion. I am actually saying that the Data Vault is not complete, from how I see it… Let me explain my view on this matter;

IBM’s BDWM contains a SOURCE SYSTEM COMPONENT for each row in, for instance, the Involved Party Identifier entity. The structure for this entity looks much like the structure for a HUB in a Data Vault Model. It contains the natural identifier, the providing source system, the type of identifier, and a datetime stamp on which the row was first loaded. For history reasons, there are also effective and enddate time stamps in this entity.


e=”text/javascript”
src=”http://pagead2.googlesyndication.com/pagead/show_ads.js”>

(more…)

IBM to Acquire Cognos to Accelerate Information on Demand Business Initiative

Posted on the January 31st, 2008. Read times

Source: Dutch Business Intelligence Blog [link]

In the following article you can read about the latest news about IBM acquiring Cognos…

http://www.cognos.com/news/releases/2007/1112.html?mc=-web_hp

p>

[Update 20071113] new articles and the POLL

(more…)

Apple and Business Intelligence

Posted on the January 31st, 2008. Read times

Source: Dutch Business Intelligence Blog [link]

Last week I got a mail from Jorgen Heizenberg, stating he reacted on his blog (www.biguru-online.com), in the acquiring of Cognos by IBM.While reading his BLOG I found an other article which interested me. It was about the SUPER FANCY SEXY Business Intelligence (See article), where a new Apple iPhone was placed besides the text.Jorgen states in his article that it important to not only have a working solution, but it has to be SUPER FANCY SEXY as well. What he’s trying to point out is the fact that the acceptance is higher when the product looks nice. You see that with the Apple iPhone also, it looks nice so people accept the fact that the product is not completely errorfree and lacks a few things. But again, a phone is for calling and that’s the primary function of the tool, whether it is a Apple iPhone, or a HTC Touch, Nokia N95 or other mobile phone.

=”text/javascript”>

eading the internet on my quest for a new gadget (read; smartphone), (more…)

Wisdom to the Mass !!!

Posted on the January 31st, 2008. Read times

Source: Dutch Business Intelligence Blog [link]

Sapientia est potentia

Last night I read an article on NU.NL. It stated that Peter Breedveld was sued for a post on his Blog. Since I am blogging as well I was interested in the content of the topic and how this could affect the way BLOGGERs are profiling or writing their stories.

Well Peter shared his experiences with Vodafone with the (internet) world and mentioned the name of the callcenter employee which "sold" him a contract, including a nice ‘new’ phone. You can read his anger and disappointment in the topic, but I was interested in the details behind the names and people…


e=”text/javascript”
src=”http://pagead2.googlesyndication.com/pagead/show_ads.js”>

Peter Breedveld had replace the name of the employee with *s, but this BLOG is about wisdom to the mass, so here’s the story of my quest for information and knowledge. (more…)

“Huge culture class” over school metrics

Posted on the January 31st, 2008. Read times

Source: datadoodle [link]

A new source in the education-testing business tells me about a “huge cultural collision” between the “sensate, feeling types and the new racetrack bettor types.”

(more…)

Wouldn’t It Be Cool?

Posted on the January 31st, 2008. Read times

Source: Keep It Simple [link]

I like this wish-list from Jill Dyché and Evan Levy for business intelligence and data integration in 2008:

Wouldn’t it be cool if companies really believed in data self-service?
Wouldn’t it be cool if application systems were measured on data accessibility and sharing, and not just processing and uptime?
Wouldn’t it be cool if data-as-a-service became a reality?
Wouldn’t […]

OWB11g, DML Error Logging and Data Rules

Posted on the January 31st, 2008. Read times

Source: Mark Rittman's Oracle Weblog [link]

One of the new features in OWB 11g (and back-ported to OWB 10.2.03) is support for DML Error Logging. DML Error Logging is a database feature introduced with Oracle 10gR2 that lets you add a LOG ERRORS clause to your DML statement, so that DML operations that would normally fail due to a constraint violation will instead move the erroneous rows into a log table and complete the load as normal. DML Error Logging is particulary interesting to data warehouse people as it means we can safely load our tables using direct path INSERTS rather than having to write complex PL/SQL to catch any exceptions.

I wrote about DML Error Logging in a blog post last year and in this article for Oracle Magazine. One of the drawbacks in the 10gR2 implementation of DML Error Logging was when you tried to use it in conjunction with conventional path inserts, which for whatever reason caused the statement to run a lot slower than if you didn’t include the LOG ERRORS clause, and so I was interested to find out whether that issue still occured with 11g. Looking back at the article and in particular the blog post, even direct path loads using LOG ERRORS came in a little bit slower than writing your own exception handler in PL/SQL, so I was also interested to know whether OWB mappings that used DML Error Logging were faster or slower than mappings that used data rules, which switch the mapping over to row-based mode and load the table using PL/SQL.

To start off then, I thought I’d recreate the tests used in the Oracle Magazine article and see how the timings compare when using the 11g version of Oracle rather than 10gR2. I wasn’t really looking at absolute times - I’m running Oracle on a different machine to the one I used then - but instead at relative times between the running inserts using conventional path and direct path inserts using, and then not using, the LOG ERRORS clause, and to compare this with a PL/SQL row-by-row insert using the SAVE EXCEPTIONS clause.

I started off by re-creating the tables in the two articles and ran a direct path insert using DML Error Logging:

SQL> insert /*+ APPEND */
  2  into   sales_target_con
  3  select *
  4  from   sales_src
  5  log errors
  6  reject limit unlimited
  7  ;

918834 rows created.

Elapsed: 00:00:03.13

OK, around 3 seconds, which compares well with the 5 seconds in my 10gR2 test. What if I run the insert in conventional path?

SQL> insert
  2  into   sales_target_con
  3  select *
  4  from   sales_src
  5  log errors
  6  reject limit unlimited
  7  ;

918834 rows created.

Elapsed: 00:00:29.67

So that’s around 30 seconds compared to 3 seconds. But how much of that is down to just running in conventional path mode? Let’s run the first one again, take off the LOG ERRORS clause but remove the rows that could cause an error.

SQL> insert /*+ APPEND */
  2  into   sales_target_con
  3  select *
  4  from   sales_src
  5  where  promo_id is not null
  6  and    amount_sold > 0
  7  ;

918834 rows created.

Elapsed: 00:00:04.05

That’s interesting, it’s slightly slower to filter out the bad rows rather than have LOG ERRORS deal with it. It’s within a margin of error though so I wouldn’t read too much into this.

What about if we run the conventional path insert without the LOG ERRORS clause?

SQL> insert
  2  into   sales_target_con
  3  select *
  4  from   sales_src
  5  where  promo_id is not null
  6  and    amount_sold > 0
  7  ;

918834 rows created.

Elapsed: 00:00:04.21

OK, it’s back to four seconds. So there’s definately a performance hit when you do a conventional path insert and choose the LOG ERRORS clause, so that issue is still there with 11g.

Before going on to the OWB test I also ran the four insert statements with primary key constraints on the table being inserted in to - the original test didn’t have a primary key on this table, but as in real life I’m likely to require one, and the data rule feature in OWB requires that we have one in place, I gave it a go.

The timings for the four inserts with a primary key present were

  1. Direct path insert with DML Error logging - 22 seconds
  2. Conventional path insert with DML Error logging - 43 seconds
  3. Direct path insert with no error logging - 19 seconds
  4. Conventional path insert with no error logging - 27
  5. seconds

Given the non-scientific nature of the tests I wouldn’t read too much into differences of a few seconds, but certainly running DML Error Logging in conventional path still adds quite a big overhead to your table insert.

In the original article, I also compared DML Error Logging with custom PL/SQL code that used the SAVE EXCEPTIONS clause. How did this perform in 11g?

Running the code without a primary key constraint on the target table came in as follows:

SQL> DECLARE
  2        TYPE array IS TABLE OF
             sales_target_con_plsql%ROWTYPE
  3           INDEX BY BINARY_INTEGER;
  4        sales_src_arr   ARRAY;
  5        errors          NUMBER;
  6        error_mesg     VARCHAR2(255);
  7        bulk_error      EXCEPTION;
  8        l_cnt           NUMBER := 0;
  9        PRAGMA exception_init
 10             (bulk_error, -24381);
 11        CURSOR c IS
 12           SELECT *
 13           FROM   sales_src;
 14        BEGIN
 15        OPEN c;
 16        LOOP
 17          FETCH c
 18             BULK COLLECT
 19             INTO sales_src_arr
 20             LIMIT 100;
 21          BEGIN
 22             FORALL i IN 1 .. sales_src_arr.count
 23                      SAVE EXCEPTIONS
 24               INSERT INTO sales_target_con_plsql
                    VALUES sales_src_arr(i);
 25          EXCEPTION
 26          WHEN bulk_error THEN
 27            errors :=
 28               SQL%BULK_EXCEPTIONS.COUNT;
 29            l_cnt := l_cnt + errors;
 30            FOR i IN 1..errors LOOP
 31              error_mesg :=
                   SQLERRM(-SQL%BULK_EXCEPTIONS(i).ERROR_CODE);
 32              INSERT INTO SALES_TARGET_CON_PLSQL_ERR
 33              VALUES     (error_mesg);
 34       END LOOP;
 35          END;
 36          EXIT WHEN c%NOTFOUND;
 37       END LOOP;
 38       CLOSE c;
 39       DBMS_OUTPUT.PUT_LINE
 40        ( l_cnt || ' total errors' );
 41       END;
 42  /
9 total errors

PL/SQL procedure successfully completed.

Elapsed: 00:00:11.03

Running it again with the primary key in place brought the time up to around 19 seconds. If we take the most realistic scenario - inserting into a table that has a primary key - the timings now come out like this:

  1. Direct path without DML Error Logging - 19 seconds
  2. PL/SQL with SAVE EXCEPTIONS - 19 seconds
  3. Direct path insert with DML Error Logging - 22 seconds
  4. Conventional path insert without error logging - 27 seconds
  5. Conventional path insert with error logging - 43 seconds

which would seem to suggest that, if you can strip out the erroneous rows beforehand and then run your insert in direct path mode, or alternatively write your own PL/SQL error handler, that’s the fastest, otherwise do a direct path insert with error logging turned on but whatever you do, don’t use error logging with a conventional path insert otherwise your load will grind to a halt.

So, how does this all relate to Warehouse Builder then? As a quick recap, Warehouse Builder from 10.2.0.3 onwards lets you enter an “Error Table” name for each target table in a mapping, wherapon it will insert a LOG ERRORS clause into the DML statement that loads the table. If you’ve licensed the Data Quality Option for OWB, as an alternative you can define a data rule and attach it to a table, which again makes warehouse builder direct all the error rows into an error table (albeit one that OWB defines for you automatically). The kicker though is that mappings using data rules then run only in row-based, PL/SQL mode, which when all things are considered usually runs a bit slower than set-based code. I wonder though, whether these latest versions of OWB actually use DML Error Logging for the data rule mappings and then run in set-based mode as well, and to find this out I put a couple of mappings together. For all these tests onwards, I’ve added a primary key to the target table as this is the most realistic scenario.

The mapping itself performed the same load logic as the SQL statements beforehand. For the mapping that was going to use DML Error Logging, I just copied the columns from the source table into the target table, like this:

To turn on the DML Error Logging feature, I selected the target table and entered the log table name (created previously from SQL*Plus using DBMS_ERRLOG.CREATE_ERROR_LOG).

Taking a look at the mapping code that OWB generates for this, you can see the LOG ERRORS clause in place:

INSERT
    /*+ APPEND PARALLEL("SALES_TARGET_CON") */
    INTO
      "SALES_TARGET_CON"
      ("SALES_ID",
      "CUST_ID",
      "PROD_ID",
      "CHANNEL_ID",
      "TIME_ID",
      "PROMO_ID",
      "AMOUNT_SOLD",
      "QUANTITY_SOLD")
      (SELECT
  "SALES_SRC"."SALES_ID" "SALES_ID",
  "SALES_SRC"."CUST_ID" "CUST_ID",
  "SALES_SRC"."PROD_ID" "PROD_ID",
  "SALES_SRC"."CHANNEL_ID" "CHANNEL_ID",
  "SALES_SRC"."TIME_ID" "TIME_ID",
  "SALES_SRC"."PROMO_ID" "PROMO_ID",
  "SALES_SRC"."AMOUNT_SOLD" "AMOUNT_SOLD",
  "SALES_SRC"."QUANTITY_SOLD" "QUANTITY_SOLD"
FROM
  "SALES_SRC"  "SALES_SRC"
      )
    LOG ERRORS INTO ERR$_SALES_TARGET_CON
          (get_audit_detail_id) REJECT LIMIT 50
    ;

So I save the mapping and switch over to the Control Center Manager, deploy the mapping and then run it. Looking at the timings, it came in at around 18 seconds, a little bit faster than the hand-written insert statement, but again within a margin of error.

For the mapping that used a data rule to handle errors, I first created a couple of manual data rules, one for AMOUNT_SOLD > 0 and one for PROMO_ID is NOT NULL. I then added this to the target table using the Data Object Editor, like this:

I then created the mapping again, which this time showed some “error rows” on the target table operator as I’d now associated this table with a set of data rules.

>

Over on the table operator properties panel, I turned on the two data rules and set them to move all errors to the error logging table.

Taking a look at the generated code, it’s clear that the mapping isn’t using DML Error Logging; it still creates a few test selections, works out whether any errors have occurred, in some cases inserts into the target table using regular (i.e. non-DML Error Logging) direct path insert statements, in some cases inserts using BULK COLLECT .. FORALL. This is not neccesarily a problem, as we saw from the first tests using PL/SQL can actually be as fast as direct path inserts using LOG ERRORS, but it’ll be interesting to see how the load performs.

Running the test first time around was a bit of a surprise actually, as it still errored when trying to insert null values into the PROMO_ID column - this column has a NOT NULL constraint and is one of the constraints that causes the load otherwise to fail. When using DML Error Logging, the load carried on as normal though, whereas with a data rule, it must still be trying to insert the null value as the mapping errors at this point. Removing the NOT NULL constraint allows the mapping to run, which means that (and I remember this from when I first tried out data rules) it won’t protect you from an actual, physical constraint violation, as data rules assume that all data checks are carried out “virtually” within the OWB repository.

Anyway, running the mapping again with the PROMO_ID constraint dropped gave a run time of between 2 and 3 minutes, which I checked a few times by re-running the mapping. Taking a look through the PL/SQL code generated shows that it’s not using the SAVE EXCEPTIONS clause, and that’s even with the “Bulk Processing Code” option checked in the mapping configuration, so whatever’s going on there, on the face of it it’s not as efficient a way of handling potential mapping errors as using the LOG ERRORS feature.

Of course with data rules, and the Data Quality Option in general, you get a lot more features, such as the ability to siphon off the error rows and correct them as part of the same mapping, plus of course your repository and mapping now contain metadata on the allowable values in your warehouse and how errors are corrected. Performance-wise though, as long as you ensure the table insert runs in direct path mode, using the LOG ERRORS clause is faster than using data rules, although of course clearing out potential errors prior to loading your tables, and then loading them without any additional error handling features, is usually faster still.

Anyway, support for DML Error Logging in OWB 11g and 10.2.0.3 looks like a pretty neat feature, and of course it doesn’t require you to license the Data Quality Option. Whilst I wouldn’t use it wholesale on all mappings - adding the clause slows down your inserts even if you don’t hit any errors - it’s a nice feature you can switch on fairly easily and it’s about the fastest way you can gracefully handle errors in a mapping.

Part 3: Secrets of the Masters

Posted on the January 31st, 2008. Read times

Source: Blog: Dan E. Linstedt [link]

Every good BI/EDW solution is backed by a good architecture, DW2.0 is no different. The frame-work that DW2.0 provides is a sound framework with all the components necessary. That said, in addition to the framework, architectures need to exist at different levels, as do standards, and templates. A solid enterprise data warehouse project usually contains many of the following components that implementers and consultants use to make a project successful.

SSIS Flat File Preview (Funny) Bug

Posted on the January 30th, 2008. Read times

Source: Miky Schreiber's Blog - BI [link]

This is a good one: When you build a flat file connection to a csv file, you can preview the data. There, there’s an option to skip some rows (Data rows to skip). If you’ll leave it with a number greater than zero - the process itself will skip these rows!! I still wonder if this bug is By Design or not. If you wish, you can track this bug in Microsoft
Connect
.

Bernard Calls it Quits

Posted on the January 30th, 2008. Read times

Source: Keep It Simple [link]

I heard Bernard Liautaud received a well-deserved standing ovation at the final standalone Business Objects sales kickoff a couple of weeks ago. With today’s news of his resignation it became even more appropriate. Bernard’s been at the helm of Business Objects from day one and the company has consistently been a leader in the traditional on-premise business […]

Blogging Ideas

Posted on the January 30th, 2008. Read times

Source: Jesse Orosz: Analysis Services Blog [link]

I’m looking for something to blog about, any ideas?
 
jesperzz at hotmail dawt com

Business versus IT

Posted on the January 30th, 2008. Read times

Source: Frank Buytendijk Blog [link]

In his blog “bi for business people,” Tom Hudock revives the old business versus  IT debate, based on some of his recent experiences.

Although I totally agree with the general gist of his post, he makes a few remarks that I’d like to comment on. I know Tom likes a good debate as much as I do!

Tom argues that the sponsor for any BI project should come from the business, not from IT. The CIO often is not at the decision-making table, he observes, and he mentions “the golden rule”:  those with the money, make the rules.

You know what, I think Tom is right, but I still don’t agree. Maybe it is the case, but it shouldn’t be. I’ve always said that the only project approach more disastrous than the IT-driven project, is the business-driven project. No concept of architecture, no leverage for other areas, no real expertise in systems implementations.

I really would like to introduce another golden rule, or wait, let’s make it the platinum rule: “those with the knowledge and experience, make the rules.”  BI implementations, like most IT projects, have a strong business side and a strong IT component. Both IT and Business need to agree. Which brings me to a broader point, organizational maturity. In a project you collect the necessary skills, such as project management, technology skills and business skills, regardless in which department they reside. If I see an “IT project,” or a “business project” doing BI, I know enough, it’s gonna fail. You need a “BI project,” or an “XYZ project.”

 

Bottom line: It’s not OR, it’s AND. And if the company culture doesn’t allow that, and one overrules the other, well, every organization gets the results it deserves.

 

-frank

 

Next Page »