BI Blogs

Bringing together Business Intelligence voices from across the web

Oracle BI EE 10.1.3.4.1 – Do we need measures in a Fact Table?

Posted on the May 31st, 2010. Read times

Source: Rittman Mead Consulting [link]

I know this blog title can be classified as a deviation from my more conventional ones but i have been planning on writing this for a while now, so hopefully this should be of some interest to everyone. As we start working with BI EE more and more, we get to know what its good features are and along with it we get exposed to its not so good ones(as with any tool) more. Recently i came across a requirement where BI EE was being implemented on a Data Warehouse that was built almost 4 years ago. The Data Warehouse had been built based on Kimball’s model and it had been satisfying the requirements of the customer so far(excel & MSAS reporting) unless and until BI EE came in as potential replacement for MSAS. When the customer first told me this, i was kind of confused on why BI EE failed to meet their requirement considering the fact that it was a very simple data warehouse(standard set of dimensions and 3 to 4 fact tables). On further analysis, there were 2 main requirements that stood out to be showstoppers in using BI EE

1. Ability to use any dimension member(Across dimensions) as a measure and the ability to use any measure as a dimension attribute – This is one key requirement that cannot be solved easily by BI EE currently. The problem with Business Model and Mapping layer of BI EE is the fact that we have to identify measures. Though this will work in most cases but if we want to do true multi-dimensional reporting, then the tool should ideally provide us with an ability to even treat measures as dimensions. For example, consider the report shown below

Picture 18

This is a very simple report where Amount Sold and Quantity Sold are measures (both in the data warehouse as well as in BI EE). But what if the customer requirement is to create a report as shown below

Picture 19

In the above case, Amount Sold and Quantity Sold measures in the first report have become dimension attributes. Also, the Channel dimension attributes Direct and Indirect have become measures in the second report.

In my customer scenario, end-users required the ability to treat any dimension as measure and vice versa. So, clearly they wanted a multi-dimensional reporting capability using their existing Data Warehouse and using BI EE. In retrospect, i would say this is a very common requirement for users who are used to Multi-dimensional tools like Essbase, MSAS etc. In some cases, with BI EE 10g, where we need to treat dimensions as measures, we just use Pivot Tables. But in cases, where measures in BI EE need to be shown as dimension attributes, then the only way to solve such requirements is to use set operations like using UNION etc.If we use them in Answers, we will start facing all sorts of issues like losing drills, losing the ability to use is prompted filters etc. There are other methods as well like the one that i have discussed before here. But the problem is the fact that we cannot make any method (BI EE modeling methods)generic enough to solve all these requirements.

2. The second requirement was to provide drilling ability on measures. For example, consider the report below. Here the 2 measures Amount Sold and Quantity Sold have been grouped under SALES. Similarly there are another 2 measures, Unit Price and Unit Cost which have been grouped under COSTS.

Picture 20

Drilling on COSTS or SALES should give us the individual measures.

Picture 21

Picture 22

Though the above example demonstrates a single step Measure hierarchy, the actual requirement was to have a multi-level measure hierarchy.

Ideally, considering the requirement, this would have been solved by using Essbase & Smart View. But considering the fact that the customer had already bought BI EE, a solution was to be arrived at using BI EE 10g. There were 2 things that we knew & were sure of

1. The data warehouse was working well. The customer was open to tweaking it a bit further to enable this requirement within BI EE.
2. Essbase provides this capability and will quite easily solve the customer’s requirement.

We even thought of probably asking the customer to wait for BI EE 11g, but again the actual problem was in the fact that measures were actually required by BI EE which i don’t think will go away in the near future since the fundamental design of BI EE RPD revolved around the need to identify measures and dimensions(we weren’t sure of what changes in the RPD were coming in 11g and hence we had to find a solution in 10g itself). This is where we had to devise a DW model using the concepts of Essbase and then retrofit it back to the DW as well as BI EE.

I will illustrate what we did for the customer considering the SH schema of Oracle which is based on Kimball’s DW methodology.

Picture 23

As you see, we have 2 fact tables(of differing grain) and multiple dimension tables linking to the facts. Each fact table has a set of pre-defined measures as shown below

Picture 24

Picture 25

From an end-user perspective, each of the measures above are nothing but attributes of data. So, in effect if you model this in Essbase, what essbase will do is, it will maintain an internal single measure called (<Data>) and then everything else are nothing but attributes of this single measure(i.e dimensions. How these are internally stored within Essbase is determined by the 2 storage options BSO and ASO). So, to model this similar to Essbase, we start with separating out measures into its own dimension table called Measure as shown below

Picture 27

Then we create 2 new Materialized Views, Sales_Data & Costs_Data, on top the Sales and Costs fact tables as shown below

Picture 28

So the actual relationship diagram between the fact and dimension tables will look as shown below

Picture 29

So far so good. What this enables us is a fact that we can now sort of treat all measures as dimension attributes. But how do we now provide the ability to treat dimension attributes as measures.

This is done by applying Filter function on the DATA column. For example, to create a measure called Direct Sales we create a logical column and then apply a function as shown below

FILTER("SH - Multiple Measure Dimensions"."Sales - Facts".Sales
USING
"SH - Multiple Measure Dimensions".Channels."Channel Class" = 'Direct')

Picture 31

This way, we can create as many logical measure columns as possible depending on what we need to treat as measures in the presentation layer. The good part about this is, we can use an attribute as a measure and also as a dimension.

The idea is we build a model containing the following presentation layer

Picture 30

For every dimension table, there will be a corresponding Fact Measure table with measures required for reporting.

So, to achieve the first requirement where we need to convert the measures as dimension attributes, we choose the following columns in the report

Picture 32

Picture 19

If we look at the SQL of this report, you would notice that FILTER function converts it into relevant case when statements thereby providing us with the ability to do multi-dimensional analysis

select T14286.PROMO_NAME as c1,
     T14409.MEASURE as c2,
     sum(case  when T14224.CHANNEL_CLASS = 'Direct' then T14394.DATA end ) as c3,
     sum(case  when T14224.CHANNEL_CLASS = 'Indirect' then T14394.DATA end ) as c4,
     T14286.PROMO_ID as c5,
     T14409.MEASURE_ID as c6
from
     MEASURES T14409,
     PROMOTIONS T14286,
     CHANNELS T14224,
     SALES_DATA T14394
where  ( T14224.CHANNEL_ID = T14394.CHANNEL_ID and T14286.PROMO_ID = T14394.PROMO_ID
and T14394.MEASURE_ID = T14409.MEASURE_ID and (T14224.CHANNEL_CLASS in ('Direct', 'Indirect'))
and (T14409.MEASURE in ('Amount Sold', 'Quantity Sold', 'Unit Cost', 'Unit Price')) )
group by T14286.PROMO_ID, T14286.PROMO_NAME, T14409.MEASURE, T14409.MEASURE_ID
order by c1, c2

Similarly to solve the second requirement, we just need to pull the relevant measure attributes from the measure dimension and then use the DATA field for our measure

Picture 33

Picture 20

There are significant advantages of this method as this does not fix us to any specific dimension/measure. It provides an ability where we can swap measures and dimensions at will.

So the question to everyone reading this is, do we need measures at all in the Fact table(apart from the implicit Data Fact column)? Majority of the tools like BOBJ, Cognos etc all require measures to be assigned explicitly. How about forking out the measures into their own dimension and then treat everything else as normal? Are there any downsides to this approach? The features offered seem to outweigh the conventional modeling method. But i am aware that there could be potential problems with this as well. One problem which i could see is the performance. The size of the fact tables can grow quite quickly if there are a lot of measures. Also, the end-users might have to get used to this model. Is there anything else that you see could become potential problems in the future? If so, do put your thoughts in the comments section.

Algunas normas en el trabajo

Posted on the May 31st, 2010. Read times

Source: Todo BI: Business Intelligence, Data Warehouse, CRM y mucho mas... [link]


SSIS 2011 – what’s planned for the new version

Posted on the May 30th, 2010. Read times

Source: Ella Maschiach's BI Blog [link]

This is continued live blogging from the Donald Farmer seminar from the SQL & BI convention in Tel Aviv of Data Integration in Large Organizations.

 

The roadmap for SSIS 2011 - what will exist in the future version of Integration Services 11

Today everything exists in a package.
In SQL Server 2011 there is an SSIS Server which uses SQL Server OS to manage data flows without being confined to a package. Most of the work is around building the server and the tools to manage the server.

A new designer interface which will be built around common tasks.
A designer with a walkthrough according to the functions or transformations that you've added.

MetaData management: 

  1. Where does the data come from
  2. What is affected from the package downstream - cube, SharePoint list

SSIS 2008 Performance and Scalability

Posted on the May 30th, 2010. Read times

Source: Ella Maschiach's BI Blog [link]

This is continued live blogging from the Donald Farmer seminar from the SQL & BI convention in Tel Aviv of Data Integration in Large Organizations.

 

SSIS 2008 Performance and Scalability

 

The importance of measuring to improve performance

Understand and measure hardware:
How many CPU cores?
How much memory?
How fast is the I\O subsystem?

Understand the potential bottlenecks:
Understand the limits of the Source System (usually a bottleneck as it's usually slower than the new system)

Gains from the source system:
Find better drivers
Configure the driver
Optimize I\O and network configuration

Measure Speed per Connection: Row / sec = Row Count / Time of the data flow  

Improving the speed of the Source
Use multiple connections (not possible for a text file, but is possible for a DB with a query that takes CustID 1-1000 and then 1000- 2000 etc)
Convert the data from string to the relevant type (like int) already at the source - can create better performance.

FastParse - doesn't have any error handling. We have to guarantee that the integers are indeed integers else we won't get the error and it just won't execute. We gain better speed.

Don't assume you know where performance is best. Test it to see where it will be best - source, transformation or destination.

SSIS will take all the memory possible for it. There is no memory management for it at the current version, though it is planned for future release.

 

Tuning the Data Flow

Use the NOLOCK hint to remove locking overhead in large table scans
SELECT only columns you need

Network Tuning
Change the network packet size in the connection manager - put in the max of 32767, cut it by half and see how it changed performance.

Make data types as narrow as possible
Casting data types is expensive, especially in dates
Think of money, float and decimal - consider how effective they are.

SQL destination - guarantees great performance if the DB is on the same server as the SSIS.

Use partitions and partitions SWITCH load data in parallel and in great performance and then use SWITCH to connect all the tables together.

Even if you use split, you don't have to union all them. You can load the data in parallel to the DB. This is dependant of having an index on the table.

Simplify the logic as much as possible.

Change the Design
Don't Sort unless you absolutely have to (a time you do need - joining data when one source is text file that needs to be sorted). Usually you don't need to use Sort. Use the capabilities of the Source System (indexes, set based operations).
Sometimes T-SQL is faster - taking advantage of Indexes or set operations: an Update by T-SQL, group by, sum.

Delta detection: if more than 10% of your data has changed - reload it!

Do minimally logged operation if possible.

Designing for parallelism
Partition the problem: partition source data, partitioning the target table.
Design to stay in memory.

Schedule it correctly.
Right now, we're using SQL Agent, suppose to get a scheduler for the next version of SSIS.

Lookup in SSIS gives good performance, close in performance to T-SQL.

SSIS 2008 improvements over SSIS 2005

Posted on the May 30th, 2010. Read times

Source: Ella Maschiach's BI Blog [link]

This is continued live blogging from the Donald Farmer seminar from the SQL & BI convention in Tel Aviv of Data Integration in Large Organizations.

SSIS 2008 improvements over SSIS 2005

C# scripting can reference all .net assemblies
VSA replaced by VSTA

Pipeline Scale Up

Tuning SSIS through looking at the buffers on the Data Flow.
In SSIS 2005 you would add buffers and break up flows

SSIS Tip:
Tools > Options > BI Designers > IS Designers:
Control Flow Data Connect
Data Flow Data Connect
Check them both to connect the components automatically

Use Row Sampling component to do performance testing.

One buffer structure goes all through the process.
We load data in the beginning but we don't need it to move in memory. The problem is when one buffers slows down the process, it slows down for all the buffers before him as well.
In 2008 we have benchmarks for each buffer and if needed, create buffers artificially (rather than manually as we had to in 2005).

Asynchronous component - the input and output are different. We don't know what have in the middle.
Sort - asynchronous, creates a new buffer as you can exclude a column.
The other components don't need to be held up when an asynchronous component doesn't run quickly, as they don't depend on it.
Advanced Editor for the component: SynchronousInputID exists for synchronous components (and not for asynchronous).

ADO .net support - for improved source UI and added destination adapter (enabling you to use SSIS to load data into providers other than SQL Server).
The performance is still 30% slower because of the use of ADO .net provider, but it enables greater flexibility.
Use OLEDB providers as is, preferably don't use .Net OLEDB providers through ADO .net providers unless you have to.
When you connect using the ADO .net, you will only be getting a list of tables. If the data isn't in shape of tables, you won't see anything in the dropdown list.

Use ADO .net if you have to use ODBC, or if the component you're using requires it, or if you need it for your Script component.
Your default should still be OLEDB.

Import \ Export Wizard is much smarter:
ADO .net support for source and destination
Data types conversion - does only accurate conversions

Starting Import \ Export from the program menu enables straight execution. Starting it in SSIS Package requires execution to be done manually.  

Persistent cache usage reducing DB and memory usage.
Cache in Lookup enabling more control over sharing and lifetime of reference data.
Cache potential cons - requiring more disk storage and data in it available only for SSIS.

New to SQL Server 2008 that's useful for SSIS 2008:
SQL Merge Syntax
Change Data Capture CDC (works only against SQL Server or through partner product against other sources)

Enhanced supportability:
In case SSIS crashes, SuperDump collects all the data needed to analyze why it crashed. Enables quicker time to analyze and treat the problem.

Data Profiling which enables you to react to the data quality. Enables analysis of the amount of nulls in a column, the uniqueness of a field etc.

Community Samples on Codeplex - code for you to use freely (SharePoint List adapter for instance).

More connectivity out of the box in SSIS 2008. The Attunity connector which is inside SSIS 2008 moves data into Oracle quicker than with the connector Oracle gives you.

Live Blogging Data Integration in Large Organizations

Posted on the May 30th, 2010. Read times

Source: Ella Maschiach's BI Blog [link]

This is live blogging from the Donald Farmer seminar from the SQL & BI convention in Tel Aviv. I would really like to thank Hamada who lent me his netstick so I can blog - Thanks Hamada!

Business Intelligence is about people not about IT.
Your data is always connected to real world and in direct connection to it. It's important to know your business for you to design a good Decision Support system.
BI has always been in the middle of the spectrum - tactical decisions. CEO and operational workers will usually look just at your BI reports, they won't create them themselves.

For the CEO, you have strategic decisions - Dashboards and Scorecards. Getting data aggregated.
For Operational Workers, you need embedded reports, analytic APIs so that they can get the info from where they're working (no special knowledge needed). Getting the data to the right place.

SSIS for the operational level. SSAS to model the operational data giving a consistent view of the data for the entire organization. PerformancePoint Services giving the high level strategic view.

Decisions flow from the strategic level down to the operational end. Traditional BI from the tactic level may move up to the strategic level and change their decision.

Questions you need to ask yourself: Who am I giving the info to? What level of info do they need? How will they use the info to make decisions?

Prototype a BI project. Test the accuracy and performance of the DW and also see that people actually use the solution you developed. Do Beta testing to see how people use the reports - look at the query log their using and tune the DW according to that.
An outer user should be treated differently only from the security aspect, but except for that you treat them just as any other user. This is unless you just give them a report rather than let them use freely the data.

Operational - won't learn new application - reports, embedded functions, mobile BI
Tactical - reports, cubes, annonationas and SharePoint workflow.
Starategic - PerformancePoint Services, annonationas and SharePoint workflow.

SharePoint is the largest growing server in Microsoft. SharePoint is growing because people need to share info and work on it together in a managed way. SharePoint 2010 has social features as well, as collaborating through social networks has become more and more important. Where you're collaborating and where you're sharing, is now also where you use your Business Intelligence, which is also integrated into SharePoint.

1. IT oversight - how much control IT has

2. Organizational alignments - coordination with the goals of the organization

3 S of a DWH:
Scale
Standards - built to. Critical. Laws for instance about HR DWH (for instance) and who do you allow against it (Security).
Skills

DWH and Reports - high on 1 + 2
Spreadsheets & desktop databases - low on 1 + 2

Between the two there exist:
Shared sheets, Databases
Departmental Data Marts

Sharepoint as a way to bind all of these together.

New to SQL Server 2008 R2 - Master Data. Master Data is not a DW. DW is for analysis and hold data of current state + history. Master Data is about the current state of the Business Entity - is for operational use. Master Data - keeps the data synchronized for all the departments.

Next version of SQL Server in 2011 will include more Data Quality capabilities based on abilities acquired from Zoomix.

ETL - Extratct Transform Load. Moving data in the organization. SSIS.
EAI - Enterprise Application Integration. Moving messages in the organization. Biztalk.

Biztalk and SSIS work both on 3 layers: Data, Logic, Presentation. Coordinate on all levels.

Request and Reply - with ETL updates. Cross reference table is updated by an ETL process in batch mode.

VM Ware Networking Options

Posted on the May 29th, 2010. Read times

Source: Blogging about all things SAS [link]

How does networking inside a virtual machine work?

Your VMware Fusion virtual machines talk to your network using a virtual network adapter. Inside your virtual machine, the guest operating system (Windows, Linux, etc.) believes that it is equipped with an ordinary (wired) Ethernet card. But Fusion patches this card to your Mac’s regular network connection, regardless of whether it is wireless or wired.

What choices do I have for how my virtual machine connects?

Using Fusion’s Virtual Machine menu, you can choose among bridged mode, NAT mode, and host-only mode. NAT mode is the default.

network-options.png

What do these choices mean, and how do I choose the right one for me?

We’ll examine each of these network modes in turn.

What is bridged mode?

Your home or office network is probably equipped with a router for talking to the Internet. Bridging is a network term that describes extending a network without using a router. When you place your virtual machine’s virtual network adapter into bridged mode, your local wired or wireless network is effectively extended to your virtual machine. Your virtual machine becomes a peer of all the other computers on that network.

bridged1.PNG

Although your virtual machine will connect to your local network using the same hardware your Mac uses, the virtual machine will retain its own fully independent network identity. Mac OS’s networking features cannot tell the difference between your virtual machine and a PC on your local network; Mac OS interacts with them over the network in exactly the same way.

Many users’ Macs get their IP address from their wireless base station or router, using a piece of software built into those devices called a DHCP server. In bridged mode, if the operating system inside your virtual machine (Windows, Linux, etc.) is configured to request an IP address from a DHCP server, your virtual machine will get its IP address from the same DHCP server your Mac uses.

What is NAT mode?

NAT is a network technology that protects one network from another. For example, your router probably also acts as a firewall: it protects your Mac by preventing unknown computers on the Internet from connecting directly to it. The term “NAT” is an abbreviation for “Network Address Translation”; the protection afforded by a firewall works (in part) by the router representing your Mac on the Internet. Your router substitutes its own address for your Mac’s.

nat1.PNG

VMware Fusion’s NAT mode is the same technology, but one layer closer to you. NAT mode protects your virtual machine from the other computers around it by placing the virtual machine on an isolated virtual network. Whenever your virtual machine wants to communicate with your office network or the Internet, it does so through a software firewall. This firewall is a component of VMware Fusion that runs inside Mac OS.

Notice that, in this situation, your virtual machine is not on the same network as your Mac and your router. Compare this to the diagram of bridged mode, in which your virtual machine is a peer of your Mac and your router.

In NAT mode, your virtual machine will get its IP address from a DHCP server supplied by VMware Fusion that runs in Mac OS, just as the firewall does. But this IP address will only be used for relaying communication between your virtual machine and the software firewall; that firewall will represent your virtual machine for its network communication with the outside world. Another way to think about this: from the Internet’s perspective, your virtual machine is sharing your Mac’s IP address.

What is host-only mode?

In host-only mode, your virtual machine is not only protected from your local network and the Internet, but also locked out of them. The virtual machine’s network world is wholly within your Mac.

hostonly1.PNG

Just as in NAT mode, your virtual machine will get its IP address from a DHCP server supplied by VMware Fusion that runs in Mac OS.

How do I choose the right network mode for me?

The right network mode for your virtual machine depends on how you plan to use it. In an office or home-office environment, with network printers and file sharing, bridged mode is probably best, particularly because bridged mode allows your virtual machine to use Apple’s Bonjour technology for finding printers. Windows’s workgroup features need bridged mode; only in bridged mode will a Windows virtual machine’s Network Neighborhood show the other PCs on your local network.

NAT mode is ideal when you want your virtual machine to be protected from other computers on your local network, such as when you are using a public wireless connection. Because, in NAT mode, your virtual machine shares your Mac’s IP address for purposes of external communication, it’s especially ideal for use when you are using a paid wireless service, such as in a coffeehouse or hotel. The use of bridged mode would require you to pay twice: once for your Mac and once for your virtual machine.

Host-only mode is useful for environments where your Mac has no network connection at all, or when you wish your virtual machine to be completely isolated from the rest of the Internet.

That’s a lot of choices. How do I keep them all straight in my head?

Here’s a way to think about the network choices:

Mode Nickname
Bridged Home-office mode
NAT Starbucks mode
Host-only Airplane mode

Can I change from one mode to another?

Yes! But, unless you have configured your virtual machine to use fixed IP addresses, you must tell the operating system inside your virtual machine to release and renew its IP address. Renewing your IP address after you change network modes will automatically contact the correct DHCP server: the one on your local network if you changed to bridged mode, and the one provided by VMware Fusion if you changed to NAT or host-only mode.

To release and renew IP addresses within Windows, open a command prompt using Start -> Run -> cmd . Then, in the command-prompt window, give first this command:

ipconfig/release

then this command:

ipconfig/renew

Rebooting the virtual machine will also cause it to obtain a fresh IP address.

It’s a love thing…

Posted on the May 29th, 2010. Read times

Source: Ella Maschiach's BI Blog [link]

It's been three years that me and the blog have been together (truthfully, the date was on the 15th, and I didn't forget honey, I was just really busy with work, so you understand now don't you?… Ok, I'll get you flowers :))

I have to admit that looking back at older posts is sorta weird, cause I think my own voice in my writing has changed since then. But then again, I guess you never stop changing your own voice in your writing, just as you never stop changing yourself.
I also really learned to appreciate the blog more, as time passed. Just a few weeks back, Microsoft had organized a meeting for their IT bloggers. I was there. And it really made me look at all I got from the blog - both professionally and personally. You don't expect that from a blog, or at least I didn't…

Still, I know that you, yes you, my dear reader, have also contributed to this. I have gotten along the years quite a few comments on my posts that lead to new posts. That's because you helped me with your comments look into something interesting or challenging. That's how my post about Recursive Sum for a Matrix in SSRS started, back in the day.  
Also, I understood, you really like free stuff - so it's not just me (yay! :))
And that you're also interested in data.

I would really like to hear from you personally. What is it you would like to read about most in this blog?
Are you looking to read more about best practices of design and development? More about the fundamentals of MDX (or perhaps DAX)? More about Report Builder in its newer versions? More free stuff? More videos and less text?
Post me and the blog a comment - we'd love to hear from you.

Ok, now enough with the gushing - hit it Fatboy Slim!


.com/~r/EllaMaschiach/~4/9tqeL0fIn-k” height=”1″ width=”1″/>

Apple App Store vs FSF

Posted on the May 28th, 2010. Read times

Source: James Dixon's Blog [link]

A

Aanmelden AMIS Query “An Evening With…”: Doug Burns

Posted on the May 28th, 2010. Read times

Source: Weblog for the Amis technology corner [link]

Op Donderdag 17 juni, vanaf 18:00 uur, zal Oracle Database Expert en Oracle ACE Director Doug Burns(Schotland), een kennisavond vullen met live demo’s op basis van de Oracle Enterprise Manager Diagnostic en Tuning Pack.
Deze bijzondere avond, geheel zonder slides, met de naam “How I Learned to Love Pictures – Oracle 10g/11g Performance Analysis Using […]

Aangeboden: ADF 11g Training – 7-11 juni (Nieuwegein)

Posted on the May 27th, 2010. Read times

Source: Weblog for the Amis technology corner [link]

Van maandag 7 tot en met vrijdag 11 juni verzorgen Luc Bors en Lucas Jellema een 5-daagse ADF 11g training – in het kantoor van AMIS in Nieuwegein. Deze training is bedoeld voor ontwikkelaars die met ADF 11g aan de slag zijn of willen gaan en behandelt alle onderdelen van het framework: ADF Business Component, […]

Analytical marketing roadmap - part 2

Posted on the May 27th, 2010. Read times

Source: The sascom magazine blog [link]

This is the second entry in my series interpreting what it means for Chief Marketing Officers to become analytical marketers. Part 1 in the series, which covered planning, can be found here. This post describes phase two, the project kick-off. I’ll start by listing phase two objects, and then provide details on how to achieve them.

OBJECTIVES OF PHASE TWO

  • Develop a small, core team to lead the later phases of the project and a wider team who will take on specific tasks and shape details.
  • Get a pilot project underway that will deliver a quick win to quite the sceptics.
  • Understand what customer data and information you already have at your disposal.
  • Deliver departmental data stores and analytics.
  • Create the first (analytics- and deliverable-focused) departmental metrics and incentives.
  • Prototype a working methodology (and tools) for sharing knowledge.

– THE KICK-OFF MEETING
Vital to the success of any change program is the explicit support of the senior executive team (in most organisations, those with the most influence to change our behaviour are our immediate line managers and the Chief Executive Officer). Plan a high profile kick-off meeting with senior executives and the active participation of the CEO. Task the management team with cascading down the project rationale and cascading up constructive feedback (and, hopefully, names of volunteers). The CEO needs to personally commit to the project, being a vocal advocate throughout its entire lifetime, including the inevitable times when there are setbacks or progress is slow.

Continue reading “Analytical marketing roadmap - part 2″

Realtime Data Warehouse Challenges – Part 1

Posted on the May 27th, 2010. Read times

Source: Rittman Mead Consulting [link]

In a previous part of my notes on Realtime Data Warehousing I mentioned some of the challenges of reducing latency. The piece picked up quite a few comments – to which I say thanks to all that posted responses. One of the comments from Matt Hosking mentioned some of the points I was to raise in this posting.
If you ask someone on the outside of developing a (near) realtime data warehouse what the greatest challenge will be they probably would say “capturing the change” since they know we can already “do” data warehouses. I think that is wrong, capturing change is easy; the big problem is applying that change in a timely fashion to a data warehouse that also remains available for query. Adding relatively few rows of new fact to a table is trivial compared to the actions needed to validate, transform, apply keys, index, and publish the fact; and then think about the impact of merging that new fact into existing aggregate tables or materialized views. A lot of moving parts, a lot of challenge.
Realistically, we could populate an “atomic data store style” layer in realtime with what is in effect a versioned (timestamped, journalized or however you term it) replica of the source, a replica which is probably suited for realtime reporting but what we don’t get are the features of a data warehouse that we come to expect in a traditional star schema DW. We possibly miss out on: data validation through the ETL process, data enrichment and derived measures, conformed dimensions, slowly changing dimensions (especially type 2 SCD) through surrogate keys. It may well be that you don’t actually need a star model, after all one of the viable DW models for an Exadata warehouse is just that; a bunch of conventional tables joined on the natural business keys.
Another point to consider is that it is quite unlikely that all of the fact domains in a data warehouse need to be realtime ones; for example data sourced from a supplier’s EDI feed may arrive far less frequently than, say, sales transactions from the company’s web-store. Obviously, if we have realtime feed of sales, we must ensure we have all of the dimensional (reference) data loaded before a new transaction arrives, or else develop robust ways to handle this. This is a situation where we need business knowledge; if a new customer can be created at time of purchase (as often is the case for a web sale) we will need a realtime customer feed along with the realtime sales feed, but for banks with strict money laundering regulations customers are registered way before transactions occur, so a timely load of customer is likely to be sufficient.
Not only is it unlikely that all data feeds to a data warehouse need be realtime, it quite likely for some “facts” that only some measures are realtime measures. Consider sales: we know the quantity and the price charged to the customer at the time of the sale, but we may well not know the cost of goods until the time the order is fulfilled.

Hadoop on Azure?

Posted on the May 27th, 2010. Read times

Source: Chris Webb's BI Blog [link]

Here’s something interesting I’ve just seen on James Dixon’s blog: apparently Microsoft is preparing to provide Hadoop on Windows Azure. Here’s the article James links to:

http://www.sdtimes.com/link/34319

I wonder if this is just a stopgap, in response to customer demand, as the article suggests it might be? How does this fit with the recently-announced Technical Computing Initiative? Was Project Dryad a dead end?

The 2010 BI Forum Quiz

Posted on the May 25th, 2010. Read times

Source: Rittman Mead Consulting [link]

Last week at the BI forum I ran a quiz.

It was a a light hearted affair with specialist subject questions mixed with general knowledge.

The winning team overall was JEH, and the most obiee skilled team was the Dream Team. Top prizes included some Brighton rock and a pencil!!

I had to dig around in the documentation to find some of the questions, and found a few tough ones, so I was delighted that everyone did pretty well. If you were in a team you may want to see the results below!

Here are the questions. See how many you can get.

I may post the answers later!

1. What can’t you ‘manage’ in the Admin tool

a) Projects

b) Sessions

c) Security

d) Variables

e) Models

f) Cache

2. Who thinks Coding in UDML is fun

a) Andreas Nobbman

b) Christian Berg

c) Mark Rittman

3. Which if these is NOT a view in OBIEE

a) Text

b) Legend

c) Static Text

d) Narrative

4. Which of these is NOT a table type in the physical layer.

a) Materialized View

b) Stored Procedure

c) Physical Table

d) Select

5. What is this Icon used for?

a) See who has access to a page

b) See more details on a group

c) Add a column to a page

d) Add a user to a group

e) Add a page to a dashboard

6. What does the UDML statement “DECLARE ENTITY FOLDER” create

a) Presentation Catalog

b) Presentation Table

c) Physical Display Folder

d) Logical Display Folder

7. Which of the following is NOT valid UDML

a) DECLARE ATTRIBUTE

b) DECLARE FOLDER

c) DECLARE TABLE KEY

d) DECLARE COLUMN

8. Which of these is NOT a real utility

a) nQSStartup.exe

b) nQSChangePassword.exe

c) nQLogViewer.exe

d) nQCmd.exe

e) nQUDMLGen.exe

9. Which of the following is NOT a real setting in the privileges section of the web based admin

a) Analyze BI Publisher Reports

b) Access to RSS Feeds

c) Manage Privileges

d) Edit My Dashboard

e) Add/Edit Nested Request View

10. What year saw the release of Siebel Analytics 7.5

a) 2002

b) 2000

c) 1998

d) 2004

11. Which of these is NOT a real Oracle Function

a) NEXT_DAY

b) TRUNK

c) NEW_TIME

d) EXTRACT

e) MONTHS_BETWEEN

12. Which of these is NOT a real Admin Tool Utility

a) Repository Documentation

b) Remove Unused Physical Objects

c) Aggressive Persistence Wizard

d) Oracle BI Event Tables

e) Rename Wizard

13. What setting do you add to instanceconfig.xml if you want group dashboards together in a single hyperlink

a) DashboardMenuLinks

b) PortalBannerLinkItems

c) DashboardMaxBeforeMenu

d) DashboadLinkItems

14. Which SQL statement is valid

a) CREATE OR REPLACE FUNCTION AW3(vInt IN PLS_INTEGER)

b) CREATE OR REPLACE MATERIALIZED VIEW AW2 AS SELECT * FROM AW1

c) CREATE OR UPDATE VIEW AW4(COL1) AS SELECT COL1 FROM AW1

d) CREATE OR REPLACE TABLE AW1(COL NUMBER)

15. What was the name of the original company that developed OBIEE

a) Nquire

b) Majendi

c) Oracle

d) Siebel

16. Which of the following functions is NOT valid

a) Week_Of_Year.

b) DayOfQuarter

c) Week_of_Quarter

d) DayOfWeek

e) DayOfYear

f) DayOfMonth

17. Which function returns the first item in the list that the user has permission to see

a) INDEXCOL

b) CHOOSE

c) CHOOSECOL

d) INDEX

18. True or False: To make a dashboard visible to one group only, that group needs to be added to the repository

a) TRUE

b) FALSE

19. Which well known blogger has his own brewery

a) Venkat

b) Adrian Ward

c) Jeff Mcquigg

d) John Minkjan

20. Which of these is a real professional LinkedIn group

a) Oracle Business Intelligence Group

b) Oracle BI Nerds Group

c) OBIEE Unlimited Group

d) OBIEE for Beginners Group

21. What is the maximum number of characters that display in a textbox

a) 125

b) >100000

c) 256,000

d) 255

22. What is the default size parameter of the CHAR datatype in a Cast function

a) 30

b) 100

c) 2000

d) 255

23. Which of these is NOT a real configuration file

a) NQSConfig.xml

b) NQclusterconfig.ini

c) catalogmanager.ini

d) config.ini

e) rtfm.ini

24. Which of the following is a valid interval

a) TSI_SECOND

b) TIME_SECOND

c) SQL_SECOND

d) SQL_TSI_SECOND

25. In the instanceconfigl.xml there is a tag called DSN. In which file does this tag look for more details?

a) tnsnames.ora

b) odbc.INI

c) NQClusterconfig.INI

d) NQServer.INI

26. True or False: To use LDAP based Authorisation, you have to import the groups into the catalog

a) TRUE

b) FALSE

27. Which of these is NOT related to Security

a) Usage Tracking

b) Authorisation

c) Authentication

d) Data Restrictions

28. Which of these is NOT a real icon?

a) Happy Face

b) Fish

c) Table

d) Heart

e) Laptop

f) Bear

29. Using an embedded URL, How do you include a dashboard from another catalog, without the top border (the one with the dashboard links in)

a) &nfi

b) &Border=False

c) &Syndicate=Sibel

d) &PageOnly=True

e) It just does it for you

30. At the BI Forum in 2009, Who said, ‘Don’t try this at home kids’

a) Christian Berg

b) John Minkjan

c) @drian

d) @lex

31. Which of these of NOT a real obiee object

a) Narrative

b) Text Box

c) Edit box

d) Speech Bubble

32. Which is the lowest logging level that will include the actual SQL sent to the database

a) 4

b) 1

c) 2

d) 3

33. Which of these is NOT a real function in OBIEE?

a) MAVG

b) CURRENT_DAY

c) SPACE

d) RMAX

e) REPEAT

34. What Query Restrictions are NOT available on User Permission settings

a) Time Restriction

b) Max Rows

c) Max Minutes

d) Max tables

35. In which config file do you set the path for your java files that called in an iBot?

a) instanceconfig.xml (in OracleBIData\web\config)

b) NQSConfig.INI

c) config.xml

d) filemap.xml

e) instanceconfig.xml (in the other place!)

36. Which of these is NOT a real Oracle Function

a) LEADING

b) BIN_TO_NUM

c) NANVL

d) SOUNDEX

e) REGEXP_COUNT

37. What Command is used to embed a dashboard Page into another web page?

a) Dashboard&GoURL

b) Dashboard&PortalPath

c) Go&Path

d) Dashboard&GoPath

e) Dashboard&Portal

38. Who won the Americas cup in 2004

a) Switzerland

b) Australia

c) Larry Elison

d) Spain

e) Oracle

39. On February 25th, From the list below, who has the highest number of points on the OBIEE forum on OTN

a) Christian Berg

b) Gerard Nico

c) Jon Mead

d) Goran O

e) Phil Henson

40. How many arguments does the OBIEE function INSERT have?

a) 5

b) 1

c) 4

d) 3

e) 2

41. What is the standard Port number used in 10g for the javahost?

a) 9706

b) 9810

c) 9803

d) 9703

e) 9710

42. Which of the following is NOT a report link

a) Copy

b) Save

c) Modify

d) Refresh

e) Download

43. Which ‘Operator’ is NOT available in a ‘Column Filter Prompt’

a) is less than

b) is in top

c) begins with

d) contains all

e) is LIKE (pattern match)

And the results….

Overall Scores by Team

Don’t mess with Mom, especially if she is on Facebook

Posted on the May 25th, 2010. Read times

Source: The sascom magazine blog [link]


A recent article in the Wall Street Journal shows the perils, real or imagined, of product flaws in this emerging age of social media. Proctor & Gamble is a highly admired company especially for its careful stewardship of its many premier brands. Can it be that any American mother, in a word association game, would not tell you that Pampers equates to diapers?

Imagine the horror of the brand managers for Pampers when Roshana Shah opened a Facebook site to campaign against P&G’s recent market launch of Dry Max diapers. As the WSJ article states, this is not the only site to open up in complaint of the new design. In reading some of the recent posts, there are a lot of very unhappy moms out there. P&G is fighting back in what may be a risky approach with some discrediting of the posters and leaders of this movement. Will it work? Time will tell.

But P&G is pursuing some very positive approaches as well. They are not unaware of the power of social media. Before the product launch, P&G developed a network of Mommy bloggers to sample and test their new design. Seems like a pretty savvy move to me. They also have their own Facebook page for Pampers that states its mission as “Caring for Babies’ Happy, Healthy Development.” The site is filled with tributes and stories of Motherhood. P&G also allows posters to complain and many complaints as well as supportive posts are to be found on the site. It is really amazing. It is also a rich source of data. But how to sort through it all and make sense of it?

Continue reading “Don’t mess with Mom, especially if she is on Facebook”

מפגש מעניין ב BI User Group

Posted on the May 25th, 2010. Read times

Source: Ella Maschiach's BI Blog [link]

קהילת ה BI כולה בבאז ממפגש ה BI User Group הקרוב. המפגש יהיה ביום שני הקרוב, ה 31 לחודש. אישיות בכירה ממיקרוסופט עומדת להגיע ולהרצות על Data Mining. המפגש הוא ללא עלות.
להרשמה ולפרטים נוספים אנא פנו לדף אירוע ה BIUG.

Interesante review de soluciones BI Open Source

Posted on the May 25th, 2010. Read times

Source: Todo BI: Business Intelligence, Data Warehouse, CRM y mucho mas... [link]


Presentación cartográfica de SpagoBI

Posted on the May 24th, 2010. Read times

Source: Todo BI: Business Intelligence, Data Warehouse, CRM y mucho mas... [link]


Las 5 principales tendencias en Business Intelligence

Posted on the May 24th, 2010. Read times

Source: Todo BI: Business Intelligence, Data Warehouse, CRM y mucho mas... [link]


Next Page »