BI Blogs

Bringing together Business Intelligence voices from across the web

¿Somos lineales en nuestros análisis?

Posted on the January 31st, 2009. Read times

Source: Information Management [link]

A

ETL Subsystem 28: Sorting

Posted on the January 31st, 2009. Read times

Source: Tod means Fox [link]

This article is part of a series discussing the Kimball Group’s “34 Subsystems of ETL“. The Subsystems are a group of “Best Practices” for delivering a BI/DW solution. In my articles, I discuss how each Subsystem can be implemented in SSIS or hand coded in Visual FoxPro.

Before you can aggregate, summarize, look up, or merge data, you will need it ordered in some way. Sorting, which changes the physical order of the underlying data, is seldom necessary. Instead, indexing, which changes the logical order of the underlying data is often all you need. The Kimball Group seems to use Sort interchangeably with Order, so I won’t deviate in this post. This may be because some databases vendors do so as well.

If you’re loading data from a mainframe, then the utility programs DFSORT or SORT will be of some use. Other utilities exist depending on the vendor, but these are the main ones. The idea is to sort the raw physical data before it hits the ETL pipeline. Doing so will save processing time. Otherwise, you can simply load raw data into your staging area and then use standard SQL to order your data.

Sometimes sorting is implicit, so be careful. In SQL Server, sorting occurs when you use ORDER BY, GROUP BY, SELECT DISTINCT, and UNION. You should, for example, avoid the SELECT DISTINCT clause whenever possible, as the additional sorting can hinder performance. Many programmers and (ehem) trained DBAs automatically add the distinct clause to their queries. Be certain that the distinct clause is necessary; otherwise performance will take a hit.

Beware that sometimes too much sorting can be a problem. Here is a general rule: Only sort when you need to! Sorting is an expensive operation and I’ve seen it overused and abused.

Clusters and Heaps

A clustered index determines the physical order of data in a table. That is, data is inserted into the table in order. Pages are linked for faster sequential access. This is in contrast to a heap table, in which data is added to the end of the table and the data pages are not linked.

Clustered indexes can make a big impact in the staging area. Consider a surrogate key lookup dimension that contains 4 columns: surrogate key, business key, valid_from, and valid_to. A clustered index on business key, valid_from, and valid_to will return the surrogate key upon lookup faster than if your lookup dimension was a heap. This is mainly because “queries that return a range of values using operators such as BETWEEN, >, >=, < , and <=" are well suited for the clustered index structure.

This works well during surrogate key lookups where we typically want the surrogate key of some business key on a particular date. Of course INSERTs and UPDATEs take longer on clustered tables because the physical structure of the table needs to be maintained. So be sure to balance speed with need. In my experience: Maintaining a clustered lookup dimension is almost always well worth the effort, especially when the original dimension is big and wide.

Best Practices

Here are some best practices that might help you keep your sorting operations under control:

  • Create one tempdb data file for each physical processor on the machine (more)
  • Put the tempdb files on their own drive (more)
  • Make sure your data types are appropriate for the column (storing numbers in a varchar field is not very efficient)
  • Keep the columns you sort on as narrow as possible (here is where good data profiling will help)
  • Integer sorting is much faster than character sorting, so consider using integers instead of character data whenever possible (more)

 

SQL Server 2005 Integration Services (SSIS)

As I have stated, sorting is often overused. The Sort component that ships with SSIS is good for small data sets that fit entirely into memory. Performance will degrade as soon as there is overflow. One way around the problem:

  1. In the Source Adapter, set the data access mode to “The results of an SQL statement”
  2. Use and ORDER BY clause in the SQL Statement
  3. Open the “Advanced Editor” of the Source component
  4. On the Input and Output Properties tab, click the Source Output node and set the property IsSorted to True.
  5. Under Output Columns, set the SortKey for each column involved in the index (start at 1, use a negative number to denote a descending index order)

 

Hand Coding with Visual FoxPro (VFP9)

Physically sorting a table in FoxPro is not necessary. The physical order of the table is irrelevant because of how VFP uses its indexes. You can test this easily by creating a table with a few million rows containing a randomly generated integer value and another column with some other data (perhaps system time or something). Using the SORT command (or SELECT INTO with an ORDER BY) create a second table in the order of the integer column. Without creating an index, attempt to retrieve a row from the table based on some integer value. Now, create an index on the column in both tables. You should notice no performance differences.

So while you’ll need to understand indexes and rushmore in VFP, you won’t need to bother with physical sorts.

From Here

In the next post, I’ll discuss lineage and dependency analysis. If you work in the financial sector (like me) then this topic will be of critical interest to you. If you don’t, I suspect you still care about lineage and dependencies!

Analyzing BI Apps ETL Runs using OBIEE and the DAC Repository

Posted on the January 30th, 2009. Read times

Source: Rittman Mead Consulting [link]

The other week I posted an article on optimizing the Oracle B Applications ETL process, and one useful set of diagnostic data I highlighted at the time was the ETL step timings that are available in the DAC Console. For every running or completed execution plan in the DAC Repository, you can list out the individual steps in the process together with the start, time, end time, elapsed time, rows process and so on, like this:

Dacsql1

Working out where the time has gone in an ETL process is therefore quite simple, as you can sort the list of tasks by Start Timestamp, or Elapsed time, to quickly focus on on the tasks that have taken the most time to complete. You can also generate reports and graphs that tell you how the time was spent in this ETL run, broken down by ETL stages and so on.

Dacsql2

But what if you want to see the trend for a particular mapping, task group or phase? You can view each execution plan run in the DAC console, locating a particular mapping within each run and making a note of how long it took to run, but clearly it would make more sense if we could load up the data in Excel or in a database and analyze it there. You can, in fact, right-click on the list of tasks in an exection plan and select Output to File, and this generates a delimited flat file that you can import into Excel and analyze. But the import doesn’t in my experience handle dates and times very well, making it hard to analyze task timings, and besides - the data you’re interested in is in a couple of databases anyway and so it’d make more sense to analyze it there.

The first place that you can find task execution details and timings is the Informatica Repository. There’s a view there called REP_SESS_LOG that contains details of every workflow execution recorded in the Informatica Repository. If you run a query against this view, you can see the start and end times for a particular workflow (equating to a task in the DAC), the number of rows processed, and from this you can generate the total run time and the row throughput of the mapping, like this:

select workflow_name
,      to_char(actual_start,'DD-MON-YYYY HH:MI:SS') START_TIME
,      to_char(session_timestamp, 'DD-MON-YYYY HH:MI:SS') END_TIME
,      round((session_timestamp - actual_start) * 86400,2) ELAPSED_TIME_SECS
,      successful_rows
,      failed_rows
,      round(successful_rows / ((session_timestamp - actual_start) * 86400),2) ROWS_PER_SEC
from rep_sess_log where workflow_name = 'SDE_ORA_GLJournals_Full'

Giving you an output , in SQL Developer, like this:

Dacsql3

Using this sort of report, you can see how the two key metrics of elapsed time in seconds, and rows processed per second, have changed over a series of ETL runs. Now this is quite neat, but what the Informatica Repository views don’t have access to is the additional metadata in the DAC such as the Execution Plan name, the Source System Folder (so that you can see if you’re running a custom, out out-of-the-box version of the mapping), the ETL_PROC_WID and so on. For these. you need to run a similar query in the DAC Repository tables, which are a separate set of tables to the Informatica Repository ones, and usually held in their own schema.

select d.etl_defn_name    EXECUTION_PLAN
,      r.phy_folder_name  FOLDER
,      r.step_name        TASK
,      to_char(r.start_ts,'DD-MON-YYYY HH:MI:SS') START_TIME
,      to_char(r.end_ts,'DD-MON-YYYY HH:MI:SS') END_TIME
,      round((r.end_ts - r.start_ts) * 86400,2) ELAPSED_TIME_SECS
,      r.status
,      r.sucess_rows      SUCCESSFUL_ROWS
,      r.failed_rows      FAILED_ROWS
,      round((r.sucess_rows / (round((r.end_ts - r.start_ts) * 86400,2))),2) ROWS_PER_SEC
from   w_etl_defn_run d
,      w_etl_run_step r
where  d.row_wid = r.run_wid
and    r.step_name = 'SDE_ORA_GLJournals'
order  by d.etl_proc_wid desc

This query returns the same basic information as the previous one, except this time we’ve got some information on the Execution Plan, Source System Folder and so on, and it’s sorted by the ETL_PROC_WID so the most recent run is at the top.

Dacsql4

So far, so good. If you were particularly clever with analytic functions and so on, you could probably write some SQL now to show you the timings have changed for a particular mapping over time, or you could highlight those tasks whose timing over a window of ETL runs has changed more than a certain percentage. But given that we’ve got a perfectly good BI tool available that we can also use to analyze report run times, why don’t we use OBIEE to analyze this data as well?

If you bring the key DAC tables into the OBIEE physical layer, and the S_ETL_DAY table from the Usage Tracking tables if you want to analyze your ETL runs over time, you can then create a logical dimensional model out of them. W_ETL_DEFN_RUN and W_ETL_RUN_STEP become your fact tables, the other tables in the DAC schema make up the task and execution plan dimensions, and the S_ETL_DAY table gives you a ready-made set of time dimension rows.

Dacsql5-1

So now you can start to analyze your DAC repository data using OBIEE. Here’s a quick table of ETL runs:

Dacsql6

You can also create some nice graphs showing, for example, row throughput and time elapsed over time for a particular DAC task.

Dacsql7

… Or perhaps provide a breakdown of the total ETL time by task phase.

Dacsql8

So all in all this is not a bad complement to the DAC statistics you can get from the BI Management Pack, and the query run statistics you can get from the Usage Tracking tables and reports.. In terms of how you set the RPD up for this, the joins between the physical tables look like this (W_ETL_GROUP is an alias for W_ETL_STEP):

Dacsql9

The logical model is then derived from this, with two fact tables (one for ETL runs, and one for task runs), two conformed dimensions and a single one used just by the Task Runs fact table.

Dacsql10

The Task Runs fact table is sourced from the W_ETL_RUN_STEP table that contains most of the details of the task execution, together with the W_ETL_DEFN_RUN table that provides the ETL_PROC_WID and details of the execution plan that called the mapping.

Dacsql11

The Task dimension is a bit more interesting, as you have to denormalize the W_ETL_STEP, W_ETL_FOLDER and W_ETL_PHASE tables to get the main task details, the folder it points to and the phase it’s assigned to; then you have to join through the W_ETL_GROUP_STEP table to find the task group, then join back to the W_ETL_STEP table (aliased as W_ETL_GROUP) to get the task group name. The rest of the tables are fairly simple.

Dacsql12

Watch out for the join between the S_ETL_DAY table and the DAY_DT column in the W_ETL_RUN_STEP table, as the latter includes minutes and seconds and the former doesn’t - I had to do a complex physical join between the two tables and CAST () the DAY_DT column as DATE to truncate the seconds off. Once you’ve got this all set up though, you can start to create calculations based on the timings in the DAC tables, such as this one to calculate the elapsed time.

Dacsql13

You can even create a DAC analysis dashboard, so that you can monitor your ETL process along with your query timings, create alerts and guided analytics, and provide the ability to drill-down from the overall execution plan to show individual mappings, task groups and phases.

Dacsql14

Like I said, not bad. Don’t forget that with the DAC tables at least, none of these are “public views” or APIs so use them at your own risk, they may well change from release to release. But it’s a nice way of making use of the data in the DAC Repository, and giving you the potential later on to combine it with query response time data from the Usage Tracking tables into a single “DBA View” of what’s going on in OBIEE and the DAC.

Gutsy Decisions?

Posted on the January 30th, 2009. Read times

Source: Keep It Simple [link]

I enjoyed two interesting posts this morning on the topic of good old gut feel vs. fact-based decision making. But first, because it’s Friday, here’s a Dilbert strip that takes a shot at the gut:

The first post, Several Executives Trust Gut, provides a lot of great decision-making statistics from Aberdeen and Accenture that shed light on both the problem and opportunities for business intelligence better solutions. For example, did you know that 40% of execs trust gut feel?

The second post asks the question, “Is There Such a Thing as Good Gut Decisions?” Timo Elliott, who I believe still has a job at SAP after their layoff announcement this week, writes a thoughtful post that includes research from Malcolm Gladwell, the Vitruvian Man with a brain in his stomach, and some of his always hilarious cartoons. His conclusion:

“We all have a lot more work to do to make sure that business intelligence actually leads to better real-life decisions. “

Here are 5 signs you need better sales analytics:

5 Signs You Need On-Demand Sales Analytics t-family: tahoma,arial; height: 26px; padding-top: 2px;”>View more presentations from Darren Cunningham. (tags: analysis reporting)

Share/Save/Bookmark

Taste of success for a small retailer

Posted on the January 30th, 2009. Read times

Source: The sascom magazine blog [link]

Those of you who enjoy a Merlot, Bordeaux or Pinot Noir know how difficult it can be to leave a wine store empty handed. There are so many varietals, geographies and labels. They all offer the temptation, untapped potential and opportunity of flavors not previously experienced. During this recession, I’ve become keenly aware of the frequency of my wine purchases. My inventory is rather sparse and easily managed.

Transitioning to the survival of many small and midsize firms, they get hit first and hardest in a recession. The SMB/SMEs are less likely to gain new lines of credit and have limited potential to raise money in equity markets.

The retail fallout
This year, we seem to be transfixed by the collapse of retailers, of which most are small and midsize firms. As many as 200,000 retailers could shut their doors this year, up from 160,000 in 2008, according to Burt Flickinger, managing director of Strategic Resource Group, a retail consultancy.

Recently, one wine retailer made me reconsider the attributes, innovation, and risks in small business. This is the story of The Winehouse, a California merchant that started 20+ years ago as an old-fashion brick & mortar store before it found the internet. They have US$20 million in annual sales, 38 employees and no IT team whatsoever!

The Winehouse had a Microsoft point-of-sale system but could not see into inventory trends, seasonal trends or product turnover. The company owner, Bill Knight, realized they had a problem and wanted to learn more about what business intelligence might do for them. They attended a SAS workshop and, a few weeks later, acquired SAS Enterprise BI for Midsize Business software and implementatio from a SAS channel partner Modern Analytics.

The discovery
Knight says one newly available category of inventory data “stood out like a sore thumb:” the amount of inventory that had been in stock a year or more. The Winehouse had the system running for only two weeks when they discovered 1,000 cases of wine stuck in inventory.

If you know anything about retail, you know the key is turning over the inventory numerous times per year. Unfortunately for The Winehouse, when you have almost 1,000,000 bottles at the warehouse, it’s difficult to tell what inventory is moving. The SAS solution helped them finally uncover $400,000 of inventory that was going nowhere. The Winehouse had a clearance sale immediately and invested that cash into new products that were actively selling.

After hearing so many stories about large retailers losing money in this economy, isn’t it nice to hear of a retailer improving the way it does business instead of simply cutting staff?

The death of prevRow = row.clone()

Posted on the January 30th, 2009. Read times

Source: bayon blog [link]


Aster Data Systems supports DW with MapReduce

Posted on the January 30th, 2009. Read times

Source: Blog: Richard Hackathorn [link]

At the Boulder BI Brain Trust, we had a discussion with Aster Data Systems about their nCluster product and in-database MapReduce. It is a great discussion that got us to ponder SQL limitations and virualization potential. See the complete blog here.

SWF Search-ability Announcement from Adobe and How It Relates to Xcelsius 2008

Posted on the January 30th, 2009. Read times

Source: Laura Gibbons Scorecard Application & Six Sigma Blog [link]

Imagine Xcelsius dashboards especially built in 2008 with its flexible add-on component manager making it that much easier to customize components (think objects / widgets like scatterplots which are offered out of the box as a chart type)..

 

Now, we have a best practice for monetizing the SWF content that is part of your Xcelsius 2008 dashboard…here is what Adobe had to say:

 

Adobe is teaming up with search industry leaders to dramatically improve search results of dynamic web content and rich Internet applications (RIAs). Adobe is providing optimized Adobe Flash Player technology to Google and Yahoo! to enhance search engine indexing of the Flash file format (SWF) and uncover information that is currently undiscoverable by search engines. This will provide more relevant automatic search rankings of the millions of RIAs and other dynamic content that run in Adobe Flash Player. Moving forward, RIA developers and rich web content producers won’t need to amend existing and future content to make it searchable—they can now be confident that it can be found by users around the globe.

Why is this news important?

Adobe is working with Google and Yahoo! to enable one of the largest fundamental improvements in web search results by making the Flash file format (SWF) a first-class citizen in searchable web content. This will increase the accuracy of web search results by enabling top search engines to understand what’s inside of RIAs and other rich web content created with Adobe Flash technology and add that relevance back to the HTML page.

Improved search of SWF content will provide immediate benefits to companies leveraging Adobe Flash software. Without additional changes to content, developers can continue to provide experiences that are possible only with Adobe Flash technology without the trade-off of a loss in search indexing. It will also positively affect the Search Engine Optimization community, which will develop best practices for building content and RIAs utilizing Adobe Flash technologies, and enhance the ability to find and monetize SWF content.

Why is Adobe doing this?

The openly published SWF specification describes the file format used to deliver rich applications and interactive content via Adobe Flash Player, which is installed on more than 98 percent of Internet-connected computers. Although search engines already index static text and links within SWF files, RIAs and dynamic web content have been generally difficult to fully expose to search engines because of their changing states—a problem also inherent in other RIA technologies.

Until now it has been extremely challenging to search the millions of RIAs and dynamic content on the web, so we are leading the charge in improving search of content that runs in Adobe Flash Player. We are initially working with Google and Yahoo! to significantly improve search of this rich content on the web, and we intend to broaden the availability of this capability to benefit all content publishers, developers, and end users.

Which versions of the SWF file format will benefit from this improved indexing and searching?

This solution works with all existing SWF content, across all versions of the SWF file format.

What do content owners and developers need to do to their SWF content to benefit from improved search results?

Content owners and developers do not have to do anything to the millions of deployed SWF files to make them more searchable. Existing SWF content is now searchable using Google search, and in the future Yahoo! Search, dramatically improving the relevance of RIAs and rich media experiences that run in Adobe Flash Player. As with HTML content, best practices will emerge over time for creating SWF content that is more optimized for search engine rankings.

What technology has Adobe contributed to this effort?

Adobe has provided Flash Player technology to Google and Yahoo! that allows their search spiders to navigate through a live SWF application as if they were virtual users. The Flash Player technology, optimized for search spiders, runs a SWF file similarly to how the file would run in Adobe Flash Player in the browser, yet it returns all of the text and links that occur at any state of the application back to the search spider, which then appears in search results to the end user.

How are Google and Yahoo! using the Adobe Flash technology?

Google is using the Adobe Flash Player technology now and Yahoo! also expects to deliver improved web search capabilities for SWF applications in a future update to Yahoo! Search. Google uses the Adobe Flash Player technology to run SWF content for their search engines to crawl and provide the logic that chooses how to walk through a SWF. All of the extracted information is indexed for relevance according to Google and Yahoo!’s algorithms. The end result is SWF content adding to the searchable information of the web page that hosts the SWF content, thus giving users more information from the web to search through.

When will the improved SWF searching solutions go live?

Google has already begun to roll out Adobe Flash Player technology incorporated into its search engine. With Adobe’s help, Google can now better read the SWF content on sites, which will help users find more relevant information when conducting searches. As a result, millions of pre-existing RIAs and dynamic web experiences that utilize Adobe Flash technology, including content that loads at runtime, are immediately searchable without the need for companies and developers to alter it. Yahoo! is committed to supporting webmaster needs with plans to support searchable SWF and is working with Adobe to determine the best possible implementation.

How will this announcement benefit the average user/consumers?

Consumers will use industry leading search engines, Google now and Yahoo! Search in the future, exactly as they do today. Indexed SWF files will add more data to what the search engine knows about the page in which it’s embedded, which will open up more relevant content to users, and could cause pages to appear at a higher ranking level in applicable search results. As a result, millions of pre-existing rich media experiences created with Adobe Flash technology will be immediately searchable without the need for companies and developers to alter content.

When will the new results register on Google?

Google is using the optimized Adobe Flash Player technology now, so users will immediately see improved search results. As Google spiders index more SWF content, search results will continue to get better.

How will this announcement benefit SWF content producers?

Organizations can now dramatically improve the rich web experiences they deliver to customers and partners by increasing the use of Adobe Flash technology, which is no longer impeding the ability for users to find those experiences in highly relevant search results. RIA creators and other web content producers can now be confident that their rich media and RIA experiences leveraging Adobe Flash technology are fully searchable by users around the globe who use the dominant search engines. Furthermore, the ability to index information extracted throughout the various states of dynamic SWF applications reduces the need to produce an HTML or XHTML backup for the RIA site as a workaround for prior search limitations.

Does this affect the searchability of video that runs in Adobe Flash Player?

This initial rollout is to improve the search of dynamic text and links in rich content created with Adobe Flash technology. A SWF that has both video and text may be more easily found by improved SWF search.

Will Adobe Flex applications now be more easily found by Google search, including those that access remote data?

Yes, any type of SWF content including Adobe Flex applications and SWF created by Adobe Flash authoring will benefit from improved indexing and search results. The improved SWF search also includes the capability to load and access remote data like XML calls and loaded SWFs.

Does Adobe recommend a specific process for deep-linking into a SWF RIA?

Deep-linking, in the case of SWF content and RIAs, is when there is a direct link to a specific state of the application or rich content. A variety of solutions exist today that can be used for deep-linking SWF content and RIAs. It’s important that sites make use of deep links so that links coming into a site will drive relevance to the specific parts of an application.

To generate URLs at runtime that reflect the specific state of SWF content or RIA, developers can use Adobe Flex components that will update the location bar of a browser window with the information that is needed to reconstruct the state of the application.

For complex sites that have a finite number of entry points, you can highlight the specific URLs to a search spider using techniques such as site map XML files. Even for sites that use a single SWF, you can create multiple HTML files that provide different variables to the SWF and start your application at the correct subsection. By creating multiple entry points, you can get the benefits of a site that is indexed as a suite of pages but still only need to manage one copy of your application. For more information on deep-linking best practices, visit www.sitemaps.org/faq.php.

Is Adobe planning on providing this capability to other search vendors too?

Adobe wants to help make all SWF content more easily searchable. As we roll out the solution with Google and Yahoo!, we are also exploring ways to make the technology more broadly available.

Where to go from here

For for more information from Google on SWF search, read Improved Flash indexing on the Official Google Webmaster Central Blog.

mp;NA=1149&PI=73329&RF=&DI=3919&PS=85545&TP=scorecardstreet.spaces.live.com&GT1=scorecardstreet”>

.NET vs. Java Consumer SDK - BusinessObjects Enterprise

Posted on the January 30th, 2009. Read times

Source: Laura Gibbons Scorecard Application & Six Sigma Blog [link]

The Java and .NET versions of the consumer SDK are identical in functionality. The two versions of the SDK are generated from a common set of Web Service Definition Language (WSDL) files. As a result, they possess identical class names and inheritance patterns. There are differences between the two, however, that are addressed in this section.

Note:    For more information on the Platform Web Services WSDL, see Using the WSDL instead of the consumer API.

Organization of plugin classes

It is the goal of this SDK to provide the same organizational structure of plugin classes as provided in the traditional, non-web services Enterprise SDK.

In Java, classes are organized in packages where the name of the plugin is part of the package. For example, the CrystalReport class is located in the com.businessobjects.enterprise.crystalreport package, while the Folder class is located in the com.businessobjects.enterprise.folder package.

In .NET, classes are organized in namespaces based on its plugin type. There are separate namespaces for destination, authentication, desktop, and encylopedia plugin classes. For example, both the CrystalReport and Folder classes are desktop plugins, so they are located in the BusinessObjects.DSWS.BIPlatform.Desktop namespace.

There is also a separate namespace for system rights in .NET.

Representation of class properties

WSDL class properties are generated differently in Java and .NET. In Java, properties are generated as getX and setX methods, where X is the name of the property. In .NET, properties are generated as fields.

In this guide, the term "property" refers to both the class method in Java and its field equivalent in .NET.

Capitalization of method names

In Java, method names begin with a lowercase character. In .NET, method names begin with an uppercase character.

In this guide, the convention is to refer to a method by its Java case.


px” height=”0px” src=”http://c.live.com/c.gif?NC=31263&NA=1149&PI=73329&RF=&DI=3919&PS=85545&TP=scorecardstreet.spaces.live.com&GT1=scorecardstreet”>

Formula Caching and Query Scope

Posted on the January 30th, 2009. Read times

Source: Chris Webb's BI Blog [link]

The Analysis Services formula engine’s ability to cache the results of calculated members can be crucial to the overall performance of your cube, but it’s all too easy to turn off this caching. I’ve blogged already about the impact that subselects have on this and today I thought I’d discuss how you need to be careful using different calculation contexts. This is a topic that’s covered in the Analysis Services 2008 Performance Guide, but I thought it would be worth talking about here because it does deserve a lot more visibility as an important cause of poor query performance, especially in SSRS reports with hand-coded MDX.

As you know, you can define calculated members in three different places in Analysis Services: on the cube (the global context), within the session (the session context) and in the WITH clause of a query (the query context). The important paragraph from the Performance Guide is this one: 

If the query processor encounters calculations created at query time, it always uses the query context, even if a query also references calculations from the global context (there is an exception to this – queries with query calculated members of the form Aggregate(<set>) do share the session cache). If there are no query calculations, but there are session calculations, the query processor uses the session cache. The query processor selects the cache based on the presence of any calculation in the scope. This behavior is especially relevant to users with MDX-generating front-end tools. If the front-end tool creates any session calculations or query calculations, the global cache is not used, even if you do not specifically use the session or query calculations.

What does this mean in practical terms though? Consider the following query on Adventure Works that uses a calculated measure defined on the cube, Internet Ratio to Parent Product:

SELECT [Measures].[Internet Ratio to Parent Product] ON 0,
[Product].[Product Categories].MEMBERS ON 1
FROM [Adventure Works]

Clear the cache and run the query and you’ll see the normal stuff taking place if you run a Profiler trace. Then, if you rerun the query on a warm cache, you should see something like this in Profiler (if you are looking at the Get Data From Cache event):

image

Clearly the result of the calculation has been cached, and the second time you run the query you’re retrieving values from the formula engine cache. Now, consider the following query:

WITH MEMBER MEASURES.TEST AS 1
SELECT [Measures].[Internet Ratio to Parent Product] ON 0,
[Product].[Product Categories].MEMBERS ON 1
FROM [Adventure Works]

We’re now creating a calculated measure in the WITH clause but not using it in the query. What happens when we run this query on a warm cache?

image

We’re still hitting cache, but it’s the storage engine cache and not the formula engine cache; the calculations are being re-evaluated for the query. So, even the presence of a calculated measure in the WITH clause, even though we’re not actually using it, prevents the use of the formula engine cache and if we’ve got complex calculations used in our query this could make our overall query performance significantly worse.

Note that the presence of a named set in the WITH clause does not have the same effect, so the following query can use the formula cache:

WITH SET TEST AS {}
SELECT [Measures].[Internet Ratio to Parent Product] ON 0,
[Product].[Product Categories].MEMBERS ON 1
FROM [Adventure Works]

So, some recommendations that follow on from this:

  • If calculation performance is a problem for you, be very, very wary of writing queries that have a WITH clause. Where possible, put the calculated members you need on the cube; even if you don’t want other users to see them you can always set Visible=false.
  • When evaluating client tools look out for those that create session or query scoped calculated members for their own internal use (running a Profiler trace will show you whether they do). This could have the side-effect of reducing query performance on calculation-heavy cubes.

ition:absolute” alt=”" width=”0px” height=”0px” src=”http://c.live.com/c.gif?NC=31263&NA=1149&PI=73329&RF=&DI=3919&PS=85545&TP=cwebbbi.spaces.live.com&GT1=cwebbbi”>

Im excited (why you might ask?)

Posted on the January 30th, 2009. Read times

Source: Blogging about all things SAS [link]

Well because at our local User Conference in February, I am goign to finally get to see SAS 9.2 in the flesh!

http://www.sunz.net.nz/conferences/sunz-2009/sunz-2009—agenda.html

The guys from SAS NZ have managed to get hold of the SAS 9.2 demo’s from the Global Forum/SAS Australia forum last year, and as I didn’t manage to make either of those I get to see them in February.

Yehaaa, Christmas all over again!

Forwarding: Webcast Announcement — Going Beyond ETL, Next Generation Data Integration and Management

Posted on the January 29th, 2009. Read times

Source: Oracle Warehouse Builder (OWB) Weblog [link]


Data Quality is Job Number 1

Posted on the January 29th, 2009. Read times

Source: Oracle Warehouse Builder (OWB) Weblog [link]


PerformancePoint in SharePoint brings BI to the Masses

Posted on the January 29th, 2009. Read times

Source: Calumo - Business Intelligence and Performance Management Blog [link]

PerformancePoint in SharePoint brings BI to the Masses

Microsoft Announces Updates to Business Intelligence Roadmap

A summary of this announcement is the decision to discontinue PerformancePoint Planning
and merge PerformancePoint Monitoring & Analytics into SharePoint as PerformancePoint
Services.

Customers Benefit with Business Intelligence at a lower TCO
Effective immediately, SharePoint enterprise customers can download PerformancePoint
for free. Conversely, customers who bought PerformancePoint with software assurance
can download SharePoint for free.

View
this video with Guy Weismantel
, Director of Microsoft BI, where he explains these
changes and the future value Microsoft customers will gain from this strategy.

Also read
the Q&A with Kurt DelBene
, Senior VP of Office Business Platform Group at
Microsoft. The interview gives some insight into the strategy announcement, why Microsoft
made this decision, how planning customer will be supported and what the overall benefit
to customers will be.

About CALUMO

The CALUMO Group (www.calumo.com), Microsoft BI
Partner of the Year 2008, is a leading provider of Performance Management solutions
for enterprise-wide reporting planning and Business Intelligence. Since 1998 the group
has successfully delivered solutions to a wide range of enterprises from SME’s to
some of the largest listed companies and government organisations. These solutions
provide quantifiable and valuable business insight, offering a single platform from
which to integrate Corporate, Financial and Operational performance and objectives. 

 
The Group is a Microsoft Gold Certified Partner in Business intelligence (achieving
full competency in both Platform and Performance Management Specializations), Data
Management Solutions and ISV/Software Solutions.

CALUMO Planning on Microsoft BI Platform

Posted on the January 29th, 2009. Read times

Source: Calumo - Business Intelligence and Performance Management Blog [link]

PerformancePoint Services included in MOSS

On 24 Jan Microsoft announced the decision to consolidate PerformancePoint Server
Monitoring and Analytics into Microsoft Office SharePoint Server. Microsoft is making
it easier for customers to achieve “BI for the masses” by including PerformancePoint
Services in the SharePoint Enterprise license thereby delivering the complete Microsoft
BI Platform (SharePoint, Excel and SQL Server) to customers at an even lower total
cost of ownership.
 
PerformancePoint Planning Discontinued

Microsoft also announced that PerformancePoint Planning will be discontinued. It must
be remembered that PerformancePoint Planning was only a planning application built
on the Microsoft BI Platform. Discontinuing this first version planning application
is actually liberating to what is now an even more unified BI platform comprising
SharePoint, Excel and SQL Server.

Performance Management = Planning + Business Intelligence

Discontinuation of PerformancePoint Planning does not mean that customers cannot “do
planning” on the Microsoft BI Platform. The essence of planning applications (budgeting
and forecasting) is allowing users to input feedback both numbers and text (write
back). Planning also includes other features (eg manipulating data according to business
rules etc), but most of these are also BI features. The term Performance Management
is used to describe the convergence of Planning and BI on a unified platform – the
cycle of Plan, Monitor, Analyze. The advantages of true Performance Management are
readily evident by integrating planning with BI in order to analyze and compare history
with future plans and what-if analyses. The architecture of effective Performance
Management must be unified.  Failing to do so requires constant batching and
transformation as the many parts of the system exchange data.  Few (if any) vendors
other than Microsoft can lay claim to this architecture.

How then to achieve Planning + Performance Management on Microsoft’s BI platform

CALUMO Planning is a complete planning application built on the Microsoft BI platform.
The CALUMO features below summarize planning capability, the missing pieces of the
performance management puzzle, with integration to SharePoint, Excel and SQL Server.

  1. Planning input forms - Bottom-Up and Top-Down write back
  2. Self Service Reporting and Analysis - Ad-hoc (pivot) browse and dynamic reports
  3. Excel and Web client for ad-hoc browse, reports and input forms
  4. Simple, flexible architecture - Planning over any SQL Server Analysis Services cube
    structures
  5. Support for: Drill Down (Members), Drill Through (Transactions or Cubes), Grids, Charts,
    Freeze frames, Text management, Slice to Excel etc.
  6. Integration with SharePoint, Excel and SQL Server

>Planning is core to Performance Management

Alfred E. Neuman “What, me worry?”  Hecklers will claim
a gap-toothed smile in Microsoft’s BI capability. Don’t be fooled, CALUMO is evidence
that planning is not missing and will continue to work with customers and Microsoft
to offer true Performance Management on Microsoft’s unified BI Platform. Web
budgeting at the University of New South Wales
is just one example of this.

CALUMO - Microsoft Business Intelligence Partner of the Year

For the CALUMO Group, 2008 Microsoft BI Partner of the Year, the withdrawal of PerformancePoint
Planning has removed any perceived confusion or conflict over differences between
PerformancePoint Planning and CALUMO Planning. CALUMO Performance Management, including
planning has a credible track record and heritage, including Microsoft case studies
at the University of
New South Wales
and Aevum

About CALUMO

The CALUMO Group (www.calumo.com) is a leading
provider of Performance Management solutions for enterprise-wide reporting planning
and Business Intelligence. Since 1998 the group has successfully delivered solutions
to a wide range of enterprises from SME’s to some of the largest listed companies
and government organisations. These solutions provide quantifiable and valuable business
insight, offering a single platform from which to integrate Corporate, Financial and
Operational performance and objectives. 

 
The Group is a Microsoft Gold Certified Partner in Business intelligence (achieving
full competency in both Platform and Performance Management Specializations), Data
Management Solutions and ISV/Software Solutions.

Advice for Sales Management in Difficult Economic Times

Posted on the January 29th, 2009. Read times

Source: Keep It Simple [link]

Jill Konrath has posted a great summary of sage advice for sales management on her Selling to Big Companies blog. Her post, The Pros Weigh in on How to Help Your Salespeople in Tough Economic Times, includes wisdom from sales gurus such as Keith Rosen, Bill Guertin, and Drew Stevens.

There’s a lot of back-to-basics advice as well as sales mentoring and coaching recommendations. If you’re tempted to revert back to the old Sales 1.0 mindset of, “the beatings will stop when morale improves,” be sure to take a few minutes to read this post.

Not surprisingly, managing by the numbers instead of gut-feel is mentioned by many of the gurus. I particularly like the quote by Lee J. Colan, Ph.D. and author of “Sticking to It: The Art of Adherence”:

Downturn leadership requires laser-like focus. Focus to reinforce customer service, existing customer relationships, and presence in the marketplaces. This results in improved perception of market position and stronger, more profitable customer relationships (again, what every sales leader wants more of).  Focus on the “vital few” - the 20 percent of customers, product lines, industries that has the greatest impact. Do not only rely on your instincts to identify your vital few—use data to determine the truth about your sales and customers.”

Be sure to check out the entire post here.

Share/Save/Bookmark

BIDS Helper 1.4 Released

Posted on the January 29th, 2009. Read times

Source: Dan English's BI Blog [link]

Just wanted to let everyone know that a new release came out this week.  Check out the add-in for BIDS out on CodePlex here – BIDS Helper.  The new features include the following (along with a few enhancements):

Release 1.4 contains a number of new features:

  • Services
  • ng Services
  • ains a number of enhancements and bug fixes:

      vices
  • Services
  • here are a couple of features for Reporting Services nowSmile see how datasets are being used and also which ones are not.  Even an option to delete those cached dataset files from within BIDSOpen-mouthed  SSIS now has a documentation report that you can generate to get some high-level information to for the packages.  I see that there has been one source code change since the release, so if you want to be on the latest and greatest then check out the source code section.

    Float Data Types

    Posted on the January 29th, 2009. Read times

    Source: Mark Garner's Business Intelligence Blog [link]

    A

    Five Critical Criteria to start your BI Initiative

    Posted on the January 29th, 2009. Read times

    Source: Business Intelligence in Real Life [link]


    1.       Desire Intelligence

    This may sound like a given, right?  Well, not necessarily.   Often times I have seen companies approach a BI initiative simply because it is a current industry trend, other CIO’s are doing it so why don’t we.  If our competitors are doing it then we need to stay competitive and do it as well.  WRONG!  Like many things in life, doing something because you feel like you feel like you are forced to will only result in a half hazard attempt and ending in a costly mistake.  So the first step, the first requirement to start a BI initiative is to desire intelligence.  You must crave it, want it, need it, hunger after it, you get the message.  What does that really mean?  You must have a passion for improving your organization through the gathering and analyzing data and applying intelligence in order to turn that data into usable, actionable, priceless information.

    Example - While working for a BI vendor, we had a new branch manager come in and he started to immediate gather and analyze information on the current branch he just acquired.  Looking at contracts that were in place, looking at opportunities that were in the pipeline and most importantly looking at the employees.  All this to ensure that this branch would be successful.  During my interview with him, he mentioned to me that he was unsure of what to think about me because I did not have a long history of selling enterprise level software and as a sales engineer that is important.  Well after about an hour of discussion, he told me that he knew that I would be successful because I was extremely passionate about Business Intelligence and helping our customers achieve success.  He was less concerned at that point about what I have done in the past because he was confident of what I will do in the future.  This is an example of how we should approach BI, like the branch manager, come into the project with the desire to learn and analyze the data, recognize passion and don’t let the past be a stumbling block…we will get more to that point in criteria #3.

    2.       Know your Business and your Industry

    Okay, now I may sound like I am contradicting myself because earlier I said that you shouldn’t go after BI just because it is a popular thing to do or your competitors are doing it.  In context that is true, the entire process should start with desire to make your organization more intelligent.  Another thought here is, “Of course I know my business”.  Fair enough but many times when an organization is implementing some type of Business Intelligence solution, it is lead by the IT department.  While the IT group in your organization may be very familiar with your business and industry, it is only secondary to their knowledge of technology.  Those running the business, those responsible for the primary activities of the revenue generating functions of your business are really the ones who should lead these types of projects.  Without a business sponsor, the project is surely on a path of failure.

    3.       Forget about limitations

    Another critical reason for these types of initiatives to be driven by the business and not IT is it is much easier for the business to forget about limitations.  Business Intelligence is all about innovation, which I will be blogging about in the coming week (innovation that is).  A major adversary of innovation are limitations.  By first taking into consideration our limitations, we automatically limit the possibilities and success of our solutions.  If IT were driving this type of project, they would automatically be biased by technology limitations.  So know your thinking, “What good is it to come up with an idea that is impossible, that just wastes time and money.”  Oh contraire mon fraire, you see as humans we always limit ourselves by what we know.  Just because we do not know how to do it doesn’t mean it cannot be done.  I learned this lesson the hard way; I was the IT Director for a company whose CEO was constantly coming up with impossible ideas.  In the beginning, the phrase, “We can’t do that, that is impossible” was a common phrase being thrown around.  After repeating that over and over and after reading Bill Gates book “Business at the speed of thought”, I realized that I was too quick to make this statement.  This realization helped me create many solutions at that company that was years in advanced of what was commercially available, and believe it or not, at a fraction of the cost.  It is a natural human propensity to limit ourselves by our own knowledge.

    On the business side, make sure you do not think also in terms of what is available, don’t limit yourself to knowledge you know is within your industry but rather information that may be outside your industry but has a direct or indirect affect on your business.  For example, I was working with a prospect on a potential solution, when learning their process for forecasting, I learned that they use over 25 different variables in order to forecast their product manufacturing.  They were able to look beyond their history, so in other words, they did not just look at past performance but rather measured that performance against other variables to establish a relationship and conversely apply that relationship toward future trends of those variables, for example consumer confidence, gasoline prices, new home sales, all of these examples require data that is most likely not controlled by your organization but by leveraging this type of out of our boundaries data you can make your organization much more intelligent.

    4.       Don’t forget what you learned in grade school

    I know you learned a lot of basic stuff in grade school but here I am referring to more specifically your science class, remember, the one Ms. Wilson taught, the one about the scientific method.  Just in case you do not remember, here they are (http://www.sciencebuddies.org/science-fair-projects/project_scientific_method.shtml) :

    a.       Ask a Question

    b.      Do your Research

    c.       Construct a Hypothesis

    d.      Test your Hypothesis

    e.      Analyze your data and draw a conclusion

    f.        Communicate your results

    Following this method will help you form what your solution should be.

    5.       Break requirements up into must have and nice to have

    The “Communicate your results” in the scientific method actually makes up this portion of the criteria of success.  You must document everything you have learned up until this point in order to gain support for your new BI initiatives.  This information will be vital in generating expected ROI.  This document will also be vital in the next phase of your BI implementation which is selecting a technology vendor.  In the coming week I will outline some guidelines in selecting a technology vendor, and you may be surprised by what I say.

    Future DBMS Appliance Possibilities

    Posted on the January 29th, 2009. Read times

    Source: Blog: Dan E. Linstedt [link]

    We live in a world proliferated with hand-held devices.  These devices can watch TV, see streaming movie content, browse the web, and provide interactivity via iconography and touch panels.  Yet, we have serious problems with the delivery mechanism of an EDW, and BI on these devices.  Yes, we can produce graphs and charts, and web-based reports - but it all appears to be back-ended by large scale systems that must be up all the time, and that we have to be interconnected to the web in order to “work” with our applications.

    There are needs out there beyond the connected, that I will explore in this entry.

    Next Page »