Posted on the March 31st, 2007. Read times
Source:
OLAP/BI/IM stuff
[link]
Posted on the March 31st, 2007. Read times
Source:
OLAP/BI/IM stuff
[link]
Posted on the March 31st, 2007. Read times
Source:
OLAP/BI/IM stuff
[link]
Posted on the March 30th, 2007. Read times
Source:
Jesse Orosz: Analysis Services Blog
[link]
After about a week and a half of ongoing discussions we’ve come up with an explanation as to why the queries were returning bad numbers. Thank goodness!
To make a long story short, I had manually designed aggregations that I shouldn’t have. We have a reference dimension with materialized set to False. This reference dimension should not have been included in any aggregations. The Aggregation Design Wizard is smart enough not to do this, however you can hack the XMLA to do so yourself. The Aggregation Manager Utility included in SP2 also allows one to add aggregations to these type of dimensions.
The reason I did not set materialize to True is because when a sales person moves between business groups that revenue needs to follow them to that new group.
So if you have a reference dimension with materialized set to False - DO NOT INCLUDE IT IN ANY AGGREGATIONS.
Below is the explanation from the support engineer.
****************************************************************************************
Reference dimension
Dimension that indirectly relates to fact data through intermediate dimension. In this case, [Business Unit] is reference dimension through [Sales] dimension as intermediate dimension
Materialize setting
Materialize=True for reference dimension will put the key attribute for the reference dimension in the fact data during processing. Therefore the reference dimension is the same as regular dimension with this setting.
Materialize=False will not include attribute from reference dimension to the fact data during processing. So access reference dimension members will happen on the fly by going through intermediate dimension. As one can see, this may impact query performance. Therefore the default setting is have Materialize=True for reference dimension
What the correct behavior should be
When Materialize is set to False for reference dimension, aggregation should not include any attribute from the reference dimension. The correct behavior should be the processing would raise an metadata validation error. A design change request to implement this has been filed to the next version of Analysis Services.
What is the problem
With this aggregation included in the cube, the aggregation creation routine does not know how to aggregate the measure from fact and the reference dimension attribute as the fact does not have the attribute from the reference dimension, therefore the measure is null for this aggregation. In the pool of aggregations created, there is another aggregation which is a good one that also covers this query, but the good aggregation is at the upper level. After full processing, both aggregations are created. When the query is executed, the query execution engine starts from the top level and goes down to find the covering aggregation, in this case, it picks good aggregation (Aggregation 34) and returns the correct result.
After Process Update on Sales dimension, Process Update will drop all the aggregations that would be affected by the data change, in this case, good agg Aggregation 34 got dropped, but the “bad” aggregation stayed as it is deeper than the good aggregation. So now the query would start picking the “bad” aggregation as the good one was gone, and returned null for the measure.
Solutions:
The direct solution to this problem is to remove this aggregation (Aggregation 30). To keep this aggregation, one would need to set Materialize to True for the reference dimension.
Since Process Update will always drop affected aggregations, it is recommended to rebuild the aggregation by running Process Index after Process Update. In this case, Process Update will recreate the good aggregation and avoid the problem for this particular query.
We will be working with content team on a KB article for this behavior as well.
?space=jesseorosz&page=RSS%3a+Explanation+Behind+the+Incorrect+Query+Results&referrer=” width=”1px” height=”1px” border=”0″ alt=”">
Posted on the March 30th, 2007. Read times
Source:
Dylan's BI Study Notes
[link]
Knowledge Module can determine how the data will be loaded from the source to the target database. Each Knowledge Module is a text based template that is used to generate the codes. It includes the skeleton, which can be in any language or technology. For example, if I select “Oracle SQL loader” LKM, ODI will generate the loader scripts. If I choose “External Table”, it generate the Insert and select statement using the external table feature.
Knowledge Module isolates the logical mapping from the different technology, so the mapping can be used from different platforms and DBMS systems. At the design time, I can specify which knowledge I want to use by the drag and drop interface. At the run time, the tool will use the Knowledge Module I specify, and dynamically generate the codes and process the job. I feel that it is very powerful and is the key component for ODI to support the hetergenious environment.
We can add our own knowledge module for the reusable components, opportunities for partnerships.
Posted on the March 29th, 2007. Read times
Source:
Todo BI: Business Intelligence, Data Warehouse, CRM y mucho mas...
[link]
Posted on the March 29th, 2007. Read times
Source:
Chris Webb's BI Blog
[link]
Yes folks, in conjunction with the UK SQL Server Community I’m proud to announce another BI Evening on the 26th of April and - by popular demand - it’s going to be in central London. We’ve got a star-studded line-up of speakers too: David Francis talking about PerformancePoint Monitoring and Analytics; the legendary Allan Mitchell on dimension and fact table ETL methods with SSIS; and all the way from the US of A (though not just for this event, he was coming here anyway) Reed Jacobson on PerformancePoint Planning and Budgeting. You’ll detect a bit of a PerformancePoint theme going on here…
Anyway, here’s the link to register:
Spaces are limited so you’d better hurry. There’s also a non-BI SQL Server event going on at the same time, and since both are being sponsored by those nice people at Red-Gate software I’m told there will be all sorts of goodies to give away.


Posted on the March 29th, 2007. Read times
Source:
Pete-s random notes
[link]
The first on a occasional series of questions I might ask in an interview - or perhaps would like to be asked if someone were to interview me…
How do you design a data warehouse?
Now that is a vague question - am I looking for an insight to the analysis and design process (requirements gathering and other […]
Posted on the March 29th, 2007. Read times
Source:
Chris Webb's BI Blog
[link]
Great post here from Erik Veerman on a way of optimising some many-to-many relationships:
I blogged about something similar, but applicable in slightly different scenarios, here:
In general the idea is that if you can reduce the size of any dimensions and measure groups involved in a m2m relationship then you’ll get better performance. I believe partitioning your intermediate measure group appropriately can also help.


Posted on the March 29th, 2007. Read times
Source:
Pete-s random notes
[link]
I could be thinking of a new presentation to write, but end up being the house-husband and not achieving my objectives. However, wishful thinking of meetings to come interrupt my day dreams
Posted on the March 28th, 2007. Read times
Source:
BI this week
[link]
Whether it’s Oracle and Hyperion or the future of BI and PM, Information Builders chief Gerry Cohen doesn’t mince words.
Posted on the March 28th, 2007. Read times
Source:
BI this week
[link]
You’ve often thought it, but it took consultancy Gartner Inc. to confirm it: not every organization has a coherent business intelligence strategy.
Posted on the March 28th, 2007. Read times
Source:
BI this week
[link]
An independent Teradata will be a nimble competitor, Teradata officials argue — nimble enough to give appliance competitors a run for the limelight.
Posted on the March 28th, 2007. Read times
Source:
Mark Rittman's Oracle Weblog
[link]
I mentioned a while ago that I’ll be doing three presentations at the upcoming EOUC Oracle User Conference in Amsterdam in May. This is the first such pan-European event for a while, and looking at the agenda there’s a pretty good range of speakers (myself excepted of course…)
Anyway, if you register now you can take advantage of a discount code that gives you 10% of the registration fee. The code is “848734121875″ and you should quote this when filling in the online registration form. Of course as well as the conference, Amsterdam is just down the road, which seems to me a cast-iron excuse to go over there for the conference and stay for the weekend. Hopefully see you there if you make it along.
Posted on the March 27th, 2007. Read times
Source:
Todo BI: Business Intelligence, Data Warehouse, CRM y mucho mas...
[link]
Posted on the March 27th, 2007. Read times
Source:
Blog: Dan E. Linstedt
[link]
As it turns out, there are a lot of applications of the Data Vault architecture which I’ve built over the past 15 years. As testimony to the efforts, there are a few companies who’ve built concepts, and data models on the Data Vault architectures, then proceeded to patent the data models and the processes around the data models in order to provide competitive edge. I think the Data Vault architecture has finally grown up. If investment bankers can see the value of the Data Vault architecture and are willing to fund a patent effort on the models built based on the standards, then there must be a correlation between the standards in the architecture and the understanding of business users.
Posted on the March 27th, 2007. Read times
Source:
Data Doghouse - performance management, business intelligence, and data warehousing
[link]
Posted on the March 27th, 2007. Read times
Source:
Data Doghouse - performance management, business intelligence, and data warehousing
[link]
Posted on the March 27th, 2007. Read times
Source:
Chris Webb's BI Blog
[link]
Via
Darren Gosbell, I see that the best practices included in the forthcoming SQL2005 Best Practices Analyzer have been turned into a white paper which you can download here:
Most of the recommendations are fairly obvious, some were new to me, some provoked a ‘they need to fix that’ response, and some I wasn’t sure I agreed with 100%. For example:
Avoid creating diamond-shaped attribute relationships
I quite often find myself designing diamond-shaped relationships in my attributes; the example they give in the text isn’t particularly good, but Day->Month->Year and Day->Week->Year is very common. OK most of the time I design user hierarchies along these paths anyway, but this need to design user hierarchies just for the sake of performance rather than because users actually want to see these hierarchies (and this isn’t the first time I’ve heard this recommended) is something that I don’t feel particularly happy about. It would be nice if we could make the choice between using attribute hierarchies and user hierarchies based on ease-of-use alone.
Avoid including unrelated measure groups in the same cube
This is the multiple measure groups and one cube vs multiple cubes with one measure group question that has been kicking around for at least a year. We were promised it was going to be addressed in the AS2005 Performance Guide but it wasn’t; when are we going to get some details published? In my experience I have seen a slight improvement in performance when you split cubes up in this way but nothing major, yet clearly there must be some reason for MS to keep recommending this so perhaps I’ve not come across the right scenario yet.
Avoid having very big intermediate measure groups or dimensions of many-to-many dimensions
Obviously having large intermediate measure groups and/or dimensions in m2m relationships is going to slow things down, but in my experience I’ve always been pleasantly surprised with the performance of m2m dimensions. I’ve seen intermediate measure groups with much more than 1 million members in perform really well…
Avoid having partitions with more than 20 million rows
As an aside, I recently noticed that in Eric Jacobsen’s recently blog entry on partitioning
here he says you can also think about 2Gb (rather than a number of rows) as a rough guideline for the maximum size of a partition, and more interestingly that you should not have more than about 2000 partitions in your measure group. I asked him about this and it turns out that there are some internal limitations in AS at the moment, which hopefully will be fixed soon, that mean that slow down performance when you have large numbers of partitions. This does suggest that at the moment roughly speaking there’s a maximum size for a MOLAP cube to perform well of around 4Tb or 30 billion rows… not that I’ve ever seen a AS2005 MOLAP cube that big, but interesting to note.
Do set the Slice property on partitions that are ROLAP or partitions that use proactive caching
Based on Eric’s article mentioned in the last point, I think it’s also a good idea to set the Slice property on MOLAP partitions where possible too given that the automatic detection of which members appear in which partitions gets confused by overlapping slices. Greg Galloway recently did some experiments on this which I saw where he showed that if you set the slice on a MOLAP partition the auto-slice information isn’t used.
Avoid creating aggregations that are larger than one-third the size of the fact data
I quite frequently find myself designing aggregations which fail the one-third rule, although not by much, and for certain queries and calculations it can be a good idea. But I agree you should only do so as a last resort.


Posted on the March 27th, 2007. Read times
Source:
Chris Webb's BI Blog
[link]
Via
Darren Gosbell, I see that the best practices included in the forthcoming SQL2005 Best Practices Analyzer have been turned into a white paper which you can download here:
Most of the recommendations are fairly obvious, some were new to me, some provoked a ‘they need to fix that’ response, and some I wasn’t sure I agreed with 100%. For example:
Avoid creating diamond-shaped attribute relationships
I quite often find myself designing diamond-shaped relationships in my attributes; the example they give in the text isn’t particularly good, but Day->Month->Year and Day->Week->Year is very common. OK most of the time I design user hierarchies along these paths anyway, but this need to design user hierarchies just for the sake of performance rather than because users actually want to see these hierarchies (and this isn’t the first time I’ve heard this recommended) is something that I don’t feel particularly happy about. It would be nice if we could make the choice between using attribute hierarchies and user hierarchies based on ease-of-use alone.
Avoid including unrelated measure groups in the same cube
This is the multiple measure groups and one cube vs multiple cubes with one measure group question that has been kicking around for at least a year. We were promised it was going to be addressed in the AS2005 Performance Guide but it wasn’t; when are we going to get some details published? In my experience I have seen a slight improvement in performance when you split cubes up in this way but nothing major, yet clearly there must be some reason for MS to keep recommending this so perhaps I’ve not come across the right scenario yet.
Avoid having very big intermediate measure groups or dimensions of many-to-many dimensions
Obviously having large intermediate measure groups and/or dimensions in m2m relationships is going to slow things down, but in my experience I’ve always been pleasantly surprised with the performance of m2m dimensions. I’ve seen intermediate measure groups with much more than 1 million members in perform really well…
Avoid having partitions with more than 20 million rows
As an aside, I recently noticed that in Eric Jacobsen’s recently blog entry on partitioning
here he says you can also think about 2Gb (rather than a number of rows) as a rough guideline for the maximum size of a partition, and more interestingly that you should not have more than about 2000 partitions in your measure group. I asked him about this and it turns out that there are some internal limitations in AS at the moment, which hopefully will be fixed soon, that mean that slow down performance when you have large numbers of partitions. This does suggest that at the moment roughly speaking there’s a maximum size for a MOLAP cube to perform well of around 4Tb or 30 billion rows… not that I’ve ever seen a AS2005 MOLAP cube that big, but interesting to note.
Do set the Slice property on partitions that are ROLAP or partitions that use proactive caching
Based on Eric’s article mentioned in the last point, I think it’s also a good idea to set the Slice property on MOLAP partitions where possible too given that the automatic detection of which members appear in which partitions gets confused by overlapping slices. Greg Galloway recently did some experiments on this which I saw where he showed that if you set the slice on a MOLAP partition the auto-slice information isn’t used.
Avoid creating aggregations that are larger than one-third the size of the fact data
I quite frequently find myself designing aggregations which fail the one-third rule, although not by much, and for certain queries and calculations it can be a good idea. But I agree you should only do so as a last resort.


Next Page »