BI Blogs

Bringing together Business Intelligence voices from across the web

A Giant Stumbles - Comdex Trade Show Cancelled AGAIN

Posted on the March 31st, 2005. Read 871 times

Source: Blog: Claudia Imhoff [link]

At one time, Comdex was the largest high-tech tradeshow in the world. Today, it was announced by the show’s owner that the show has been cancelled — for a second year!

Open Source OLAP

Posted on the March 31st, 2005. Read 502 times

Source: bayon blog [link]

Every month I review the web traffic reports for my blog, and I’ve always found something rather interesting. Even though I post more information about Oracle and OWB than any other subject, Google seems to send me more traffic from queries like “open source ETL” and “open source OLAP.” You know what they say, customer is king and you gotta give ‘em what they want!

In other words, all I needed was just a teentsee weentsee bit of an excuse to take some time to really kick the tires of the open source OLAP server Mondrian.

Some basics… Mondrian is an open source OLAP server, written in Java. It implements an MDX engine, and also exposes an XML/A interface to clients. Mondrian uses a ROLAP architecture, and ends up issuing SQL statements to a JDBC data source to retrieve and calculate. Mondrian works with Access, MySQL, postgres, and Oracle. Refer to the Mondrian architecture pages to get some more information about the architecture.

My overall impressions were positive; it’s a good core set of functionality and performs rather well. Like any Open Source project it is an alphabet soup of supporting libraries, environment variables, generators, frameworks, and takes more than the usual 10 minute commercial product install. We’re not building a kernel here, but it’s not trivial to get the examples up and running.

Mondrian works closely (and appropriately) with an open source implementation of a JSP based Pivot and Charting project, JPivot. The demo for Mondrian includes an example with JPivot querying Mondrian and uses the well known Food Mart demo. I was expecting a bit less from JPivot and was pleasantly surprised that it’s actually rather functional (not commercial product easy to use, but really quite commendable).

JPivot allows for drilling down hierarchies (I don’t think you can use multiple hierarchies) and Pivoting and exhanging the columns and rows. It has a “CUBE” editor that allows you to edit the report. It’s not drag and drop, but definitely works if you “grok” the interface.

Also pleasantly surprising was some pretty decent charting capabilities.

There are some decent selections of charts

In order to provide an OLAP view of your data you have to define some metadata about your Dimensional model (Cubes, Measures, Dimensions) and how they map to your underlying Relational Schema. Check out the samples on the Mondrian site to see how to write your own schema.

Couple of interesting things to point out, Mondrian implements a cache of “relations” used to increase performance. This is interesting because of consistency questions (some fragments are cached, but others are current) but also because it is WICKED fast once it’s loaded into memory. There are some interesting possibilities here, including some work with some distributed P2P OLAP distributed caching research.

DISCOVERER MEETS DUKE NUKEM

Posted on the March 31st, 2005. Read 437 times

Source: bayon blog [link]

Is there any better way to spend a Monday than to build a 3D Dashboard (using some sample reports from my upcoming OWB course)?

You can obviously see I’m not describing 3D report effects in the context of a web based reporting application, but rather a 3D environment that has the actual dashboard content displayed in 3D space. I had been giving a lot of thought recently to the constant craving of executives for BI dashboards. In fact I think some organizations might consider implementing such a solution because of the following two realities:

  1. Executives love web based dashboards, but what they really want is to have the NASDAQ MarketSite at their office. Screen upon screen of everything they need to know to make great decisions. Building out hundreds of LCD panels might not fly with shareholders though, and I don’t blame them as they are very expensive. With this sort of interface, one can still get the “control room” metaphor without the expense.
  2. The generation of “video game playing and we’ve been using computers since 2nd grade” are now being tapped into the highest C-level positions. It’s not uncommon for an executive to know how to play a little DOOM (even if it was more than a few years back). In other words, there are a significant number of executives that would not feel “uncomfortable” in front of a multiplayer game interface.

However, there are SIGNIFICANT limitations to a 3D space, not the least of which is the ability to handle 2D information quickly and effectively (a la web pages/actual applications). When I began the exercise I was certain that there would be no use in the project unless it could integrate back with EXISTING systems. There is no way one would be able to work effectively entirely in a 3D work, so it would have to play nicely with the current applications. I’ve done a very small integration with Oracle Discoverer where by clicking on one of the dashboard 3D objects you see Discoverer Viewer drilled to the sheet in question alongside the 3D world. Without something like this, where people could get to the “real work”, I think a 3D dashboard is pretty much a nonstarter.

One could take the metaphor further, and have virtual meetings in the control room. Here I’ve pictured the dashboard from the “3rd” person view instead of the “1st” person view. Consider a situation where the District Sales Manager calls the Senior VP of Sales and asks to “show him” some area of concern. He could literally take the SVP/Sales to the dashboard area and they could sift through the data together.

I sort of haphazardly found this 3D software, as I have a customer that used it (ActiveWorlds) as a prototype for a 3D community so their web site visitors could explore articles on their site, chat with each other, play 3D games, etc. While I was at that customer site I helped them integrate some “external media” into the 3D product, and found it rather straightforward and easy. I don’t mean to offend but I found the communities who play 3D games a bit strange and having little to no interest in the external application of their technologies. I found a myriad of resources on the legend and lore of ficticious 3D worlds (and rivalries between them) but few resources on how to “drill through with URL parameters.”

I’ve attached some screenshots in this article, and there is also a 12MB AVI file that demos the solution. I built it on a corporate system so I can’t offer up access to the outside world at this time. However, I greatly value the community traffic and opinions so if there’s enough interest perhaps I’ll see about “putting it up.” Email me if you’d like to actually play with it and I’ll see what I can do.
UPDATE: There were some issues with the CODEC for the AVI. If you had problems previously, feel free to try again.

ORACLE WAREHOUSE BUILDER : WORKSHOP I

Posted on the March 31st, 2005. Read 681 times

Source: bayon blog [link]

UPDATE UPDATE UPDATE : The class has moved to April 9th because of technical difficulties the PSOUG is reporting. Class is limited to 8 seats (very hands on) Email me or Dan Morgan to get registered…

I’ve developed a one day workshop to help those interested in how to get up and running QUICKLY with OWB. If any blog readers are in the greater Seattle area (or have always need an excuse for a weekend here) consider taking this course. Email me directly if you’re interested, and I can help get you registered for the workshop at the PSOUG lab on Mercer Island. Note: the PSOUG has not posted the event on calendar yet, but it’s most certainly scheduled for MARCH 26, 2005

ORACLE WAREHOUSE BUILDER : WORKSHOP I

This course teaches students how to install OWB into their database, and begin building Data Objects and ETL Mappings using OWB. Students will learn the basics of OWB architecture, ETL development, and executing their ETL. The students are required to install their own software and configure their own database to ensure they can leave the workshop with all the requisite skills to continue their OWB education at home, or in Workshop II. Students will be able to WOW their colleagues with their new found OWB skills building quick, optimized solutions to common ETL problems.

Course Objectives
After completing the course, students should:

  • Understand OWB architecture, including the distinction between Design Objects and Runtime Objects (and repositories!)
  • Be able to install OWB onto a database server
  • Install and configure design repositories, runtime repositories, and target schemas
  • Be able to retrieve metadata (table definitions, constraints, etc) and actually source data from remote Oracle databases
  • Be able to create and deploy Tables, Sequences, Cubes, and Dimensions.
  • Be able to create and deploy ETL mappings using multi table joins and custom transformation logic
  • Be able to sample, and load data from flat files for use in OWB
  • Be able to execute and troubleshoot ETL mappings built in OWB
  • Load a star-schema that demonstrates the immense benefit of ETL/dimensional modeling

are supplied to students at the beginning of each class. The class textbook will be optionally available for purchase from PSOUG at a substantial discount.

Class Format
Classes consist of approximately 1.5 hours of lecture supported by slides and whiteboarding. The balance of the day, approximately 6.5 hours, is hands-on. Each students works on their own Linux server.

Prerequisites
To be successful students must be comfortable with basic Oracle administration. The following is a short list of the suggested but not required skills:

  • Understand what schemas are, how they relate to users
  • Understand basic DDL (create table, alter table, create sequence, create or replace package) and DML (select, insert, update)
  • Know how to connect to Oracle using both Localnames (tnsnames.ora) and JDBC (host/port/SID)
  • Students will benefit most when they have a basic understanding of dimensional modeling (Cubes/Facts/StarSchemas)

RITTMAN ON OWB TUNING

Posted on the March 31st, 2005. Read 520 times

Source: bayon blog [link]

It is likely that most readers of the bayon blog are also readers of rittman.net. However, just in case someone missed his posts on tuning OWB Performance Tuning Framework I highly recommend that you read/bookmark them. For the casual OWB user this might not be necessary; developers who deal either with a great many mappings or mappings that process significant data volumes this is a “tool” to keep in your toolbox.

great method; the application of battle tested tuning techniques to an often used but rarely fully understood tool. I’m not sure how much time I’ll have over the next few weeks, but if I do perhaps I’ll throw my oar in as well. Things I can think of to contribute to the community dovetailing on Mark’s work include:

  • Unify the approach of monitoring the Clock Time and drill down to the performance data Mark has retrieved. ie, build a set of reports (or even a small data mart!) that consumes the public runtime performance views in the OWB runtime repository. I’m a big fan of quantifying the benefit of a tuning excercise. Perhaps a way to track which processes/mappings are slowing your loads, elapsed/time versus volume (ETL throughput), etc. Consume and understand the basic metrics, then drill down on a particular mapping into the information that Mark has developed.
  • An OMBPlus implementation of the “on/off” trace flag Mark refers to :The way of enabling tracing should either be built in by default, or easily added by the OWB developer. Ideally it should be simple to switch it on or off (perhaps levels of event 10046 tracing?). I bet one can create a script that adds/removes the pre/most mapping process that Mark has added manually to the mapping. In this method we could programmatically turn tracing on for a defined list (from one to entire repository) of OWB mappings. The logic to set the key for the process flow operators using OMBPlus might be a wee bit more difficult (would have to iterate through operators in a process and possibly make changes). The biggest drawback, and I don’t see a clear solution at this point, is how to change tracing at RUNTIME instead of deployment. Making the changes via OMB requires a redeployment but it would be most beneficial if tracing was in the code, and changed at runtime.

art of the conversation that’s been long overdue… Perhaps a small OWB coming of age? The technology is now used widely enough to warrant some of it’s own methods, patterns, and strategies for not just building but rather building well.

OWB10G PARIS : SCHEDULING

Posted on the March 31st, 2005. Read 468 times

Source: bayon blog [link]

A bayon blog reader wrote me and asked if I had any additional information on scheduling ETL mappings and process flows. I had not fiddled too much with that part of the product; the Oracle provided run_my_owb_stuff.sql and SQL Templates for Enterprise Manager had usually been sufficient for my customers. There are some new scheduling features in Paris and without the reader request I probably would have just passed them up. So while this covers a somewhat smaller feature of the new version, it’s still interesting and makes a nice installation in my OWB Paris Early Review series.

To address scheduling needs there are entirely new OWB objects, appropriately called “Schedule Modules” and “Schedules.” Those current users of the product are well aware that in previous versions of the product, no such objects existed so it’s helpful to show how these are used. One creates a new Module which is merely a container to hold the actual Schedule objects. Note: I haven’t found anything particular interesting about the module definition in terms of deployment or configuration, but perhaps it’s used in the multi-configuration deployments.

Then one creates the schedule that will contain information on when to run an OWB Mapping Process flow. Note: I tested this feature with a Mapping but not a process flow. All the documentation I’ve seen from Oracle mentions that Schedules can be applied to either Mappings or Process Flows.

Now we see a wizard, which happens to be a conundrum. Wizards are usually helpful in providing the most common use cases with a honed interface, however the actual Schedule editor is far more intuitive than the wizard as you’ll see below. I think it a bit strange that the wizard is less helpful than the actual editor; any other testers out there have an opinion please email me!

We’ll define a start and end date for the Schedule which defines the bounds on which it will run. Perhaps you’ve built a Mapping that only runs for the next 3 months during an OLTP system migration so you can define when this schedule will no longer run.

And now we’ll set our recurrence… Don’t worry, you can see more details and have more control over the Schedule in the actual editor.

I’ve now created my schedule, LOAD_EMP_EVERY_2_MINUTES and have opened it in the editor. You can see this is a much better way of visually editing the Schedule. It projects the schedule in a hierarchical manner in the Schedule Preview window (you can flatten it if you so choose). It also allows you to set clauses here, so you can fine tune when it will execute. You might need to use this if you need to limit the Schedule so that it only runs on say Monday Thru Friday when the system is populating data.

Next step is to CONFIGURE the Mapping and let OWB know that we intend to use a certain schedule when we “Run” our Mapping (or flow). For this example, we’ll assume we’ve already created and tested an OWB mapping named “STAGE_EMP_TABLE” that’s been deployed and runs properly via the Control Center.

Now, when we head back into our Control Center panel we see a new Item that hasn’t been deployed named “STAGE_EMP_TABLE_JOB.” We deploy this object

We know see we have an available job in our Control Center. Remember that in Paris, nearly all (if not all) of the Control Center runs asynchronously now i.e. the client doesn’t block while deploying an application or running a mapping or process flow.

Now, according to OWB my Job has been scheduled and is running! It shows the job, when it last ran was, and what the return was of that run. In my example, the last run was sucessful and finished at 6PM on 2/25/2005.

I did manually verify the job was running, ensuring that STAGE_EMP_TABLE was executed properly each time. You can also see a list of the runs and which job they are associated with. The control center is showing me that it is currently running the job, and that it ran the job previously twice successfully. Notice the seperation of the time is two minutes apart, so it appears as if my schedule was about right. This is completion time, so don’t be worried if it doesn’t match up because while your job may start at the correct time, it may finish with varied elapsed times.

I do believe the only way to change the schedule, Start and Stop the Scheduled jobs is through the proper OWB Client. I took a quick look at the OWB Web Based browser and it does not appear as if you can make changes. My Paris copy mentions that the browser isn’t even going to be in the Beta release so I am probably just seeing some partial build with an incomplete version of the OWB Browser.

For those that this feature matters, this will be very helpful. OWB is really shooting to be a one stop shop for all your Data Warehouse/ETL needs and the scheduling feature is a nice little piece of that. Happy OWB scheduling to ya!

This blog is part of the OWB Paris Early Review series which reviews and comments on several new Paris features.

PSOUG OWB LECTURE : DATAMART IN 90 MINUTES

Posted on the March 31st, 2005. Read 534 times

Source: bayon blog [link]

I recently did some training on OWB at my local PSOUG user group, which is not your “run of the mill” user group. Their focus on peer to peer education and quality, technical presenations makes this group stand out in my mind. If you are in the Puget Sound area (the greater Seattle, WA area for those not familiar with US regional geography) you should definitely check out the meetings. The handout from the meeting is here (PDF, Word). Watch for an email about an OWB course at the PSOUG lab!

I promised those in attendance that I would post some notes so that you can do the same excercise. Here are those promised notes, and I hope you find it useful to get you ready and in a state ready to “recreate” the excercise. Please email me with any questions you might have, or if you are planning a Data Warehouse or Data Mart using OWB. I’d love to help out by providing additional coaching, solution planning, or actual implementation.

Step 1 : Install Oracle Database 10g

We assume you know how to install Oracle software and how to use DBCA to create a database. Download the Oracle DB 10g release 1 from OTN here. Follow the instructions to install the product onto your machine. During the process (or after using DBCA) create a database using the Data Warehouse Template, and make sure and install the sample schemas.

Step 2 : Install Oracle Warehouse Builder 10g

There’s not a lot to do at this point. Download OWB from OTN, and follow the instructions on the screen and install OWB into a SEPERATE ORACLE HOME.

Step 3 : Create an OWB Design Repository

Run the following program from your OWB installation ($ORACLE_HOME/owb/bin/unix/reposinst.sh (or if you’re on windows run %ORACLE_HOME\owb\bin\win32\reposinst.bat).

Click “Next”

Enter your sys username, password, Service name, and port number

Select “Create a new Warehouse Builder Repository”

Create and install into a new user

Call it OWB_REP and give it an easy password

The default tablespaces are usually fine, but if you have installed your db differently feel free to change these.

American English is fine in most circumstances and for this demo

Review the information for correctness and then click “Finish”

Hopefully you’re watching this work and

get one of these. You’ve installed an OWB DESIGN REPOSITORY into your database.

Step 4 : Create an OWB Runtime Repository

Run $ORACLE_HOME/owb/bin/unix/runtimeinst.sh (or corresponding file for windows)
Enter in your sys information (sys/password/singapore/1521/bdev.bayontechnologies.com)

Select Runtime Repository

Select Create New Warehouse Builder Repository

Create and install into a new schema

Enter in owb_rt and an easy password twice

The warnings are nothing major… click OK

Default Tablespaces are fine

Install a new runtime access user

Enter in owb_user and an easy password twice

Port number is fine (very few people actually use the name/address cleansing and it costs extra)

Review and Click Finish

Click “Yes” to create a target schema

Select Target Schema

Create a new OWB Target Schema

This is the Runtime Repository we just created, owb_rt/password

Create and install into New user schema

owb_tst/easy password twice

Click OK again on the warning

Default tablespaces are fine

Review and click Finish

You’ve successfully created a runtime repository and a target schema to deploy your test OWB objects to!

Step 5 : Install OWF 2.6.3

Download OWF Server from OTN here.

Click “Next”

This is really important… You must use your database 10g home for installation

Select “Oracle Workflow”

Review and Click Install

Now run $ORACLE_HOME/wf/install/wfinstall.csh (or corresponding .bat if windows)
Install into OWF_MGR, Install, your connection info (bdev, singapore, singapore:1521:bdev.bayontechnologies.com) (LocalName TNSNames, hostname, JDBC string)

Using the SYSDBA account grant execute any procedure to OWF MGR

SQL> grant execute any procedure to owf_mgr;
Grant succeeded.

the SCOTT account by running the following script as SYSDBA

SQL> alter user scott account unlock;
User altered.

Step 7 : Allow OWB_TST to select from SCOTT tables

grant select on emp to PUBLIC;
grant select on dept to PUBLIC;
grant select on salgrade to PUBLIC;.

Step 8 : Get the demo materials

Download the following items which were used during the demonstration:

OWB Paris Webcast

Posted on the March 31st, 2005. Read 485 times

Source: bayon blog [link]

I recently gave a presentation at NWOUG on the new features of Paris and thought it worthwhile to post a similar presentation on my blog.

This is my first time producing a rich media presentation and did so rather quickly with the help of a tool called Camtasia Studio. Consider yourself duly warned: the audio is not that great, I didn’t do any editing, and it’s just me chatting over some screenshots of the product. You’ll need flash (1024×768 too), and I’ve split it up by segments for easy viewing. If you’re already familiar with OWB consider skipping ahead to OWB 10gR2 Overview and continuing on from there. They are each a few minutes, and are not in great depth but give a sense for the feature and it’s applicability.

I do hope it’s useful for those eager to get some new information on the Paris release of OWB. Feel free to drop me a line if you have Q’s about OWB Paris or otherwise.

Introduction
Presenter
Format/Objectives
Why OWB?
OWB ETL Leader
OWB Demo
OWB Applicability
OWB Concepts
OWB Installation
OWB 10gR2 Overview
OWB Improved UI
OWB ETL
OWB Pluggable Mappings
OWB Data Profiling 1
OWB Data Profiling 2
OWB Data Profiling 3
OWB Data Correction
OWB Right Time BI
OWB Change Data Capture
OWB Experts
OWB Impact Analysis
OWB Misc
More Information

PSOUG 23-FEB-05 ON OWB

Posted on the March 31st, 2005. Read 472 times

Source: bayon blog [link]

I’ll be giving a lecture at the upcoming PSOUG meeting on the 23rd of this month. Those locally that are interested in a free “crash course” in OWB should attend this meeting. It’s always free and usually has an excellent program aimed at technical users. Hope to see you at Oracle’s Bellevue office 108th Ave NE, Suite 1300 from 5:30 to 7pm.

NWOUG Presentation on OWB Paris

Posted on the March 31st, 2005. Read 782 times

Source: bayon blog [link]

On Monday I gave a presentation at the Northwest Oracle Users Group (NWOUG) Winter conference. This was the first NWOUG conference I’ve attended, and was grateful to the NWOUG board for accepting my presentation.
I’ve included the presentation slides here (Powerpoint and PDF) for those readers interested.
As many of you know, I’m writing a series of articles on the new features of Paris which I’m hoping to add more to shortly.

Certified Business Intelligence Professional (CBIP)

Posted on the March 31st, 2005. Read 1240 times

Source: bayon blog [link]

I had finished all three exams a while ago, but thought it prudent to refrain from any public displays until the paper certification was in hand. Don’t count your chickens until they hatch, right?

VMware and Oracle Partnership

Posted on the March 31st, 2005. Read 876 times

Source: bayon blog [link]

While looking for some additional information on Oracle and VMWare I ran across a news story I missed last week. VMWare and Oracle have signed an agreement to make VMWare the standard virtualized Oracle environment.

VMware will become the virtualization development platform for Oracle server technologies, and Oracle plans to deploy 7,000 VMware virtual servers internally.

Prem Kumar, VP of server technologies engineering at Oracle, says the use of VMware virtualization products will enable Oracle to better optimize its products for use in multiple Windows and Linux operating-system environments. “We’ve gotten significant feedback from our customers requesting VMware support,” Kumar says. “We will be using VMware as our standard virtualized environment, including building product, testing, and installing, and it will be certified as a standard configuration.” [emphasis added]

or from another article

Under the agreement, Oracle will develop, test and support its 10g database and applications products to run on VMware’s operating environment virtualization products.

As part of the agreement, Oracle and VMware say they plan to work together to provide customers with problem resolution for Oracle’s 10g database and for 10g versions of its application server, Collaboration Suite and Enterprise Manager running on VMware.

Oracle has posted evaluation kits (VMWare machines with Linux/Oracle to evaluate the products) of which more than 12,000 copies have already been downloaded.
I’ve discussed VMWare before (here, and here) and have mixed thoughts about it. I’ve found it’s greatest strengths are in development, provisioning, and manageability. I’ve had some concerns about deploying Oracle on VMWare for production and high availability environments. This new partnership should only serve to address those concerns and ensure that the “entire stack” is fully supported. Great news for VMWare customers!

HAPPY HOLIDAYS

Posted on the March 31st, 2005. Read 747 times

Source: bayon blog [link]

Wishing you and yours a Merry and Happy Holiday Season! May you be healthy and prosperous in the New Year!

Warmest Wishes,
Nicholas A. Goodman
Founder - bayon technologies, inc.

OWB10g Paris : Data Profiling

Posted on the March 31st, 2005. Read 889 times

Source: bayon blog [link]

The new Data Profiling features of the upcoming OWB release will knock your socks off. Other new features are applicable, useful, and successfully augment the product and make it better. Data Profiling opens up a whole new world to OWB with it’s wide applicability outside of Oracle Business Intelligence. In other words, readers should note that this feature is not just for OWB Developers but rather could merit firing up OWB purely to use this feature. OWB developers might be a bit disappointed with how OWB works to manage corrections, which are based on the output from Data Profiling but that shouldn’t dissuade anyone from using this feature.

Most BI and Data Warehousing professionals fancy themselves experts of their organizations’ data; they think they understand their structures, volume, and understand how they live and breathe. In other words, if you are curious which data in your enterprise is least clean consult your DW operations staff and they’ll provide anecdotal tales of why the POS system is a POS (different acronym) and that they “often get erroneous transactions with special adjustment code 99.” What is typical, and ironic given the DW mandate of providing insight and information, is that the DW team can not quantify and measure the quality of their data. What kind of questions could you use to drive data quality initiatives in your enterprise? Knowing the cost of scrubbing and manually restating (Sarbanes-Oxley consultants cringe here) data and the overall cost of making decisions by delayed or improper data wouldn’t you want to know where to start working?

Data Profiling addresses many needs by providing a set of functionality to examine, profile, and measure the quality of data. It provides a set of screens that allow you to point the application at a Data Source and immediately get some valuable information about your data. The Data Profiling features of OWB fall into three broad categories:

  • Attribute Analysis : This type of profiling revolves around the content and structure of particular attributes in your data sets. This involves numerical precisions, distinct values, outlier identification, min/max/avg, declared types versus actual types, and pattern typing.
  • Referential Analysis : Understanding how data relates to other data is a very important component of data profiling. They wouldn’t call it a relational database if the relationships amoungst data weren’t one of the fundamental and most important concepts of data organization. Identifying orphans, joins, childless entries, and redundancy identification all fall under the umbrella of Referential Analysis.
  • Functional Analysis : Beyond the pure join relationships, there are also ways of surmising how the data functionally relates to each other. One can surmise the relationship between attributes and relationships for determination and dependencies. Normalization candidate identification can be achieved by leveraging this analysis (identify which attributes might be prime candidates for factoring out and normalizing into an exterior table).

be as small as a Fax Wizard the Data Profiling features of OWB require a sort of linear progression, especially early on in the process. This makes sense; you have to connect to your data, identify which data is of interest, retrieve and analyze your data, explore and understand your data, and optionally deploy code to monitor and make corrections to your data.

I’ve included a few screenshots, which speak quite clearly to the types of analysis that OWB can perform on your data. It should be noted, that since Oracle retrieves and analyzes the data in a temporary workspace that one can perform this analysis on data contained in nearly any heterogenous supported connection, including plain old flat files.


Figure 1. Screenshot of some of the aggregation statistics available. Note the ability to actually see the detail rows here


Figure 2. If you don’t feel like consuming the analysis in numerical form OWB provides a way to visualize characteristics of your data


Figure 3. Domain stats are very useful for determining status codes, valid product categories, etc. It also shows details and percentages

Probably the feature Oracle is hoping you’ll use most, and why Data Profiling is included in OWB instead of an alternate tool or standalone product is OWB’s ability to monitor and make corrections to data based on the work one does in the Data Profiling section of the product. OWB endeavors to allow you to quickly, and programatically leverage the results of an analysis into working monitors and data cleansing routines. It’s a noble start at implementing easy rule based automated ETL generation and cleansing of data but has some difficulties in ongoing management. See Note Number 1 Below


Figure 4. Based on a rule you’ve defined as part of your profiling, launch the Create Correction wizard


Figure 5. One can define several actions that are implemented to “correct” the data. There are some excellent capabilities for the correction action (Match Merge, Remove, etc)

One can create rules and corrections from inside the Data Profiler that can be deployed to correct and audit your data. Consider a real life example of a data source that contains order information, including order status history records. An order will contain information about the Customer, order amount, etc. Order History records contain information about what occurred to the order and when (received by warehouse, shipped, etc). You’ve analyzed your current data set using the data profiler and you’ve found that Order History records arrive with status codes that are two letter codes, but occasionally) when orders are forced to a different status an operator uses mixed case (”Sh” instead of “SH”) for the status codes. You’ve got 100k order statuses conforming to your dictionary of status codes, and a couple hundred that are off by one lower case character. Your company is entitled to a 10k discount from your fulfillment partner company every month the error rate exceeds .5% off the agreed upon data formats.

How can you leverage the new OWB feature to: audit the error rates of the received data; and correct the simple case of mixed case status codes?

  1. Connect to and identify the data of interest. Depending on the arriving format for the data, you’ll need to decide how to get it imported into the land of OWB and the Data Profiler. If it’s in a SQL Server or Oracle database you can enter in some connection information and connect directly and browse the data dictionary. If it’s in a flat file, you’ll need to sample and describe the data file.
  2. Run the profiler. Review the information that you’re seeing about the particular attribute(s) in question. In our example we’re interested in the STATUS CODE column of the order history records. The profiler will indicate a suggested domain (valid uppercase status codes) and then indicate how many (and %) records do not conform to those specifications. You can drill into these (VERY USEFUL feature) records to see that nearly all are still valid two letter codes, but have mixed case.
  3. Derive a Rule based on the domain of the records. You select the attribute, and use the Derive Rule wizard to create a rule that says the STATUS CODE must be one of the valid upper case codes.
  4. Create a correction based on the rule. You’ve now created a rule that specifies valid values for that column. How does one make corrections to these records so that they can be processed along with their compliant peers? Using the create correction wizard you can specify how to react to the rule violation. In this case one would use some of the provided logic (match merge capabilities of OWB) and define the logic for identifying the mixed case codes, and correct them by applying some TO UPPER CASE logic. This will create a mapping and other requisite structures for executing the logic. Note: The fact that the corrections are themselves OWB objects is a blessing and a curse. More on that toward the end of this article
  5. Add the mapping to your process flows. This will execute, in the place you deem it necessary where the data will be evaluated and possibly corrected.
  6. Schedule an auditing process. Outside of the correction execution you might just want to evaluate the state of the data so that you have a record of data received, it’s error rates, why it failed it, etc. You can do an audit ad-hoc, scheduled, or within a process flow. Naturally many locations will like to retrieve data from their source systems, perform the audit as part of the process flow, and then proceed with any correction and further warehouse processing.

e feature in the upcoming release of OWB. It is certainly useful to a BI consultant who needs to quickly learn and understand a customers data. I hope readers find it as useful as I have.

This blog is part of the OWB Paris Early Review series which reviews and comments on several new Paris features.

NOTE 1: Corrections are generated in a one time process and it generates all sorts of OWB objects (mappings, tables, etc) to accomplish a correction. While this is handy, I’m guessing the ongoing management will be difficult. OWB creates one of their new “nested” mappings for implementing the cleansing rule and then the OWB developer includes this pluggable mapping in your ETL(you can run it standalone if you so choose). It’s a one time generation and you can’t manage the rule. If you change your correction (include this NEW particular code or type) you have to regenerate the correction and possibly re-add it to your ETL.
NOTE 2: Thanks to Paul and Nikolai for their contribution to this blog.

OWB performance, start here

Posted on the March 31st, 2005. Read 803 times

Source: bayon blog [link]

There are literally millions of bits of data and pages in books on how to tune Oracle and PL/SQL. Most important to BI professionals is the need to beat the clock, and have their data ready by start of business(8am) at a minimum. Of course if you’re a right time environment you have even more stringent requirements. It’s likely your data processing takes a while and OWB can give you some great information on where it’s spending time. This is useful if you’re trying to pick a few mappings to “tune” and get the most bang for the buck.

Connect to the OWB_RUNTIME schema, or use select access from another use such as OWB_USER.

select
OBJECT_NAME name,
trunc(min(elapse_time)) min,
trunc(max(elapse_time)) max,
trunc(sum(elapse_time)) sum,
trunc(avg(elapse_time)) avg,
count(elapse_time) count
from all_rt_audit_executions
where 1=1
AND task_type = ‘PLSQL’
AND created_on >= to_date(’12/10/2004′, ‘MM/DD/YYYY’)
group by OBJECT_NAME
order by avg desc;

Expressed in Seconds… Yields the following results (maps changed to protect the innocent)

NAME	MIN	MAX	SUM	AVG	COUNT
map1	0	31988	165372	20671	8
map2	5494	68905	135111	19301	7
map3	1672	3509	20542	2567	8
map4	316	3511	14502	1812	8
map5	1018	2170	13089	1636	8
map6	436	1353	6784	848	8
map7	478	1272	6476	809	8
map8	309	2243	5351	668	8
 ..................

You’ll see this identifies several candidates for tuning. You also have an idea of what kind of performance gain you can hope to expect. If you improve the avg time of map1 by 10% you’ll save yourself approximatley 20671 * .9 / 60 = approx 35min per execution. This is not hard and fast, seeing as though some tasks run in parallel (if map 4 is always waiting for map3 in a process flow, you won’t cut down your time).

Howard Rogers is blogging

Posted on the March 31st, 2005. Read 825 times

Source: bayon blog [link]

Howard Rogers has always had some excellent articles posted on his site but now offers additional published nuggets on his new blog. Always a fan of great content no matter the form…

SQL Task in SSIS with external file

Posted on the March 31st, 2005. Read 712 times

Source: SQL BI [link]

SSIS has a feature I desperately wanted in DTS for years: you can use an external SQL file as a source SQL statement for the SQL Execute Task. Unfortunately this is not available for Data Flow Source component.

I tried using variables to load the SQL command for a Data Flow Source from an external text file, but you need to write a Script Task and load the file content into the variable. I’d like a custom property editor for the Value property of a Variable to insert a multi-line string (as a SELECT statement) - with February CTP you can’t use copy/paste and you need to edit the DTSX in XML to insert the initial SELECT statement into the variable (otherwise you can’t use the designer to define the Data Flow components).

Probably a better approach is to define several stored procedures and write a SQL Execute Task which execute external scripts to define stored procedure and then to use the stored procedure as Data Flow Source text command. I used a similar approach in DTS 2000. The problem is that probably the package doesn’t know that it has an external dependency, so the deployment of external scripts has to be done by hand. But it would be a good solution if a BI solution could deploy into the bin directory each file included in the solution and configured to be deployed. Actually it is possible with VS2005 so I’d like to have this chance in a BI project. Someone know a way to do it? (I already asked on the SSIS newsgroup too).

UPDATE: I just discovered that enabling the “CreateDeploymentUtility” flag into project copies all miscellaneous flag included in the project into the Deployment directory. Exactly what I wanted for deployment!


This Blog Hosted On: http://www.SqlJunkies.com/

DMReview : “The Dawn Of The BI Director”

Posted on the March 31st, 2005. Read 418 times

Source: Mark Rittman's Oracle Weblog [link]

The Dawn
Of The BI Director
: "In most organizations, business intelligence (BI)
has always played second fiddle to transaction applications. Organizations have
traditionally lavished millions of dollars on projects to implement packaged
operational applications and rewarded the IT managers who supervised these
projects with hefty salaries. In contrast, until recently, BI projects have
always been an afterthought, viewed by executives as departmental systems or the
responsibility of a few programmers who write custom reports. A career in BI was
seemingly a dead-end for ambitious IT professionals.

Today, the stepchild status of BI is changing. One sign of change is the
growing stature of BI Directors who oversee data warehousing and business
intelligence projects. These days, BI Directors regularly manage multimillion
dollar maintenance and capital budgets, oversee sizable teams of developers and
contractors, and negotiate large deals with software vendors. More importantly,
BI Directors now manage what their top executives consider to be a "strategic
resource" that supports fact-based decision making and keeps the organizations
headed in the right direction."

‘Most important ever’ MySQL reaches beta

Posted on the March 30th, 2005. Read 387 times

Source: OLAP/BI/IM stuff [link]

‘Most important ever’ MySQL reaches beta - ZDNet UK News: “‘People have been criticising MySQL since we started [in 1995] for not having stored procedures, triggers and views,’ said Axmark. ‘We’re fixing 10 years of criticism in one release.’

‘It’s the most important release we’ve ever done in the company,’ Axmark added. “

Open Source OLAP - Mondrian Review

Posted on the March 30th, 2005. Read 464 times

Source: OLAP/BI/IM stuff [link]

The Bayon Technologies blog has the best review of Mondrian I have seen yet.

Bayon Technologies :: Open Source OLAP: “My overall impressions were positive; it’s a good core set of functionality and performs rather well. Like any Open Source project it is an alphabet soup of supporting libraries, environment variables, generators, frameworks, and takes more than the usual 10 minute commercial product install. We’re not building a kernel here, but it’s not trivial to get the examples up and running. “

Next Page »