La metodología como caballo de troya
Source: Sistemas Decisionales, algo mas que Business Intelligence [link]
Si a esto le unimos la semiestructuración de los procesos decisionales, nos encontramos que para aplicar una metodología ágil en un proyecto de Business Intelligence, tenemos que tener muy claro a que nos estamos enfrentando. Para ello el primer proyecto que se haga de estas características en la organización tiene que tener no solo el caracter de una metodología rigurosa y seria, sino también la característica de “evangelizar”, de ser un caballo de Troya que infecte a todos los ambitos de desarrollo de la organización.
Un metodología (que desgraciadamente su autor no ha seguido desarrollando) es el Adaptative Software Development, esta metodología parte de la idea de que las necesidades del cliente son siempre cambiantes durante el desarrollo del proyecto (y posteriormente a su entrega). Cosa que nos viene que ni pintada para los sistemas decisionales
Su impulsor es Jim Highsmith, la novedad de esta metodología es que en realidad no es una metodología de desarrollo de software, sino un método (como un caballo de troya) a través del cual inculcar una cultura adaptativa a la empresa ya que su velocidad de adaptación a los cambios marcará la diferencia entre una empresa próspera y una en declive.

Así pues tenemos una metodología (mas bien un metodito) con 4 objetivos claros (independientemente de que el proyecto sea un éxito)
- Concienciar a la organización de que debe esperar cambio e incertidumbre y no orden y estabilidad.
- Desarrollar procesos iterativos de gestión del cambio.
- Facilitar la colaboración y la interacción de las personas a nivel interpersonal, cultural y estructural.
- Marcar una estrategia de desarrollo rápido de aplicaciones pero con rigor y disciplina.El ciclo de vida que propone se basa en tres fases
Fase 1: Especulación. Se inicia el proyecto y se planifican las características del aplicativo a desarrollar.
En lugar de analizar, diseñar, implementar, etc.. que es el lenguaje habitual con una semántica de infalibilidad y definitud (creo que me he inventado esta palabra) algo jactante, pasamos a especular, colaborar y aprender juntos y eso creo que es una buena característica para una metodología, sea cual sea su ámbito de actuación.
Por favor no me pongais ningún comentario a esta entrada (estoy probando la psicología inversa a ver si consigo mas colaboración)
The Five W’s of Database Restores
Source: OLAP/BI/IM stuff [link]
Your 2007 Head Check: Seven New Business Intelligence Gotchas
Source: OLAP/BI/IM stuff [link]
Last Night’s BI Event
Source: Chris Webb's BI Blog [link]
![]()

The crucial question: Which BI-Tool is able to aggregate this correctly? I say: “NONE!”
Source: The Data Warehouse Blog [link]
I really want to know it! I’m making the assertion, that none of the BI-Tools on the market today is capable of calculating the right aggregations for the following real-world task.
Again, we’re looking at a dimension with diamond shapes:
On the right you can see a part of the sales force of a financial services company.
C1-C4 is a subset of the clients, R1 and R2 are two sales reps and M1 is a regional manager.
For each client, the measure “a” is given, which means the yearly amount of money available to the client for investment. In order to calculate the investment potential for each node in the hierarchy, “a” has to be summed up.
It is typical for financial services companies, that each rep is expected to fully exploit a client’s potential. That means, that it is no option to use weighting factors to share the client’s potential between reps. As you can see on the right, C2 is assigned to two reps, whereas C2’s full investment potential has to be assigned to R1 and to R2 respectively.
For M1 each client’s potential must be taken into account only once.
To have something to work with, I give you the investment potentials for each client:
- C1: 2.000
- C2: 7.000
- C3: 8.000
- C4: 4.000
Here are the expected results of the aggregation:
- R1: 9.000
- R2: 19.000
- M1: 21.000
Please do not hesitate to leave your comments here. I’m still convinced, that there is no BI-tool available, which would be able to solve to above mentioned problem. These kinds of aggregations and hierarchies are everywhere and it would be a quantum leap to finally have an out-of-the-box solution that could cope with them.
Welcome to my real world experience!
Source: The Data Warehouse Blog [link]
Welcome to my Data Warehouse Blog! To illustrate the motivations to come up with this blog I’d like to tell you a little story first.
Back around 1995 I was working for a software company, which had been the market leader for Sales Force Automation (SFA) and Electronic Territory Managment Systems (ETMS) software for the pharmaceutical industry in Germany. As the leader of the server development team I was in charge of the programs and the underlying Oracle database, which guaranteed a multi-directional data flow between the different sales reps and the head office.Reps were reporting sales calls and additional data about their activities as well as certain characteristics of the doctors, pharmacies, and hospitals they visited. Additionally, pharmaceutical companies were buying turnover and sales volumes data from external providers based on geographical segments, time, and different levels of a self-defined “product hierarchy”.
All this valuable data was residing in the database on the server, which obviously triggered the desire for analytical applications.
Long before I had been responsible for the server, an analytical application had been developed by a another team. This application did no longer satisfy the growing demands of the very heterogeneous user base and thus, we decided to abandon it and to develop a new suite of analytical applications, based on a real data warehouse, from scratch.
Then, someone had the very smart idea to check the market for out-of-the-box solutions! Plus, some of our customers were already using BI systems like Cognos. After contacting Cognos we agreed to hold a three day in-house workshop as a proof of concept and soon, three Cognos staffers showed up at our office in Heidelberg: The unavoidable sales guy, a senior pre-sales consultant, and a quite attractive and very nice young woman, who seemed to be new to the company and apparently doing some training on the job. I can’t recall her name today, so let’s call her Lucy.
It seemed as if the Cognos people thought: “Yet another prospect with some data to load, to aggregate, and to render in a couple of fancy reports. That’s just right for Lucy to get some more hands-on experience. We have already taught her how to deal with multi-dimensional data and how to customize the system according to the client’s requirements.”
To make a long story short, here is what happened: At the end of the third day we were all heading to our conference room, where Lucy was expected to present what she had been working on for the last three days. As we entered the room we found Lucy in a confused state. She was crying, unable to say a word. We all looked at each other and everybody was really shocked. It was a quite embarrassing situation.
Lucy explained to us, that she couldn’t even manage to model the product hierarchy, let alone to aggregate any data. She had asked almost every expert at her company and even though we also tried to supply as many information and help as possible, she couldn’t figure out a solution. The truth was, that there was no solution. The Cognos system simply was not ready for this kind of product hierarchy, which in turn is common for almost every pharmaceutical company in the world. The same holds true for the organizational and geographical dimensions of a pharmaceutical sales force.
I’m going to describe the above mentioned dimensions and hierarchies in great detail in the upcoming posts, but, in a nutshell, their main characteristics are:
-
There are no fixed hierarchy levels, the hierarchies are purely parent-child
-
Each knot can be either the parent or the child of a knot of any type
-
Each knot can have any number of parents
-
Raw data can occur at any knot in the hierarchy
-
the data value of a knot can be the result of an aggregation of the data values of its children
-
the data value of a knot can be raw data from an external data source, deviating from the aggregate of the children
After it had been clear to me, that there was not a single out-of-the-box solution available which could cope with these hierarchies, I dedided to pick up the original plan to develop a data warehouse model from scratch.
In the course of the next couple of weeks or even months I’m going to describe the statical and dynamical data warehouse model, which now has matured over the last ten years. The hierarchies described above are only one of several features, that can severely risk the success of a data warehouse project. Many of those features will be covered in the upcoming posts (e.g. time variance, completeness of meta data, ETL tools, skills of the team members, hardware, etc.).
Finally, I can’t resist to tell you what I found out just a few weeks ago. I was doing a presentation of the prototype of an OLEDB for OLAP provider I have developed recenty (more on that in an upcoming post). It took place in the German head office of one of the world’s leading players in the phama market. They are using Cognos as their BI platform and obviously I asked them how they deal with the organizational sales force hierarchy. In a mixture of sarcasm and embarrassment they replied: “C’mon, you know it! We have to tell users not to look at any numbers above a certain level in the organization, because they are all wrong!”.
I don’t know what happened to Lucy after our workshop more than ten years ago. I hope she is doing as well as Cognos is doing today.
DWH Modeling Rule #1: Most aggregations have to be done in the Data Warehouse directly
Source: The Data Warehouse Blog [link]
If you have read my first post about “my real world experience” with out-of-the-box BI systems like Cognos, you might have gotten the impression, that I was bashing Cognos. This is definitely not the case, since Cognos and other BI systems are great software products, which offer a wide range of functionality. The point I was trying to make is, that even the leading product in the BI market was and still is not able to cope with certain data structures. It’s not that these data structures are especially weird or uncommon, no, they have occurred in each data warehouse project I have been involved so far.
The picture on the left depicts a typical hierarchy, which can often be found as the structure of a sales force.
C1-C4 are clients, who are assigned to the sales reps R1 and R2. The sales reps are both managed by regional manager M1.
A quite important measure for sales reps, managers, sales unit, and, of course, the company as a whole is the number of associated clients.
How would a typical BI tool be set up to calculate the number of clients based on the hierarchy on the left?
- The client level with members C1-C4 is defined as the raw data level. Each member has a client-id as a primary key.
. - The measures for upper levels for the sales reps and the managers are aggregated by the system. These aggregations are either pre-calculated or take place on-the-fly.
The aggregation rule is “count(distinct client-id)”.
. - First, the measures for the sales reps are calculated with the following results: R1: 2, R2: 3
. - Based on the results for the reps, the measures for the managers are calculated. The result for M1 would be 2+3=5, which is obviously wrong!
OK, most Data Warehouse and BI people know that problem, which is often referred to as “diamond shapes in hierarchies”. This term is derived from the typical diamond shape, which arises when two knots have a child knot and a parent knot in common. You can (vaguely
) see the diamond formed by C2, R1, R2, and M1 in the picture above.
The diamond shape problem is sometimes solved with weighting factors for each parent, which share a common child. To achieve correct results on the higher levels, the sum of the weighting factors has to be 1. In the hierarchy above, R1 und R2 would both have a factor of 0.5.
The weighted measure would be:
- M1: 1+1+0.5+0.5=4, which is absolutely right!
- R1: 1+0.5=1.5; R2:1+1+0.5=2.5, which is, at least in my opinion, not only wrong, but also very cruel! Human beings are cut in half!
Another fine example for diamond shapes: In my first post I mentioned the “legacy” analytical software we wanted to replace with a new solution. One of the eye-opening flaws of that software occured when we were aggregating sales data for Bayer. As you might know, Bayer is the producer of Aspirine. Now, Aspirine can be used for different indications, e.g. as a pain reliever (analgetic) or as a blood thinner.A typical example for a hierarchy knot with more than one parent. I don’t have to tell you, that all the calculated measures on the upper levels were wrong.
The quintessence of all this is, that, in order to achieve correct aggregations, you have to calculate them yourself in the Data Warehouse before you render the results in a BI application or report. You cannot expect out-of-the-box systems to be able to deliver the correct results.
How to do it right and how to model your relational Data Warehouse by using an enhanced star schema will be described and discussed in the upcoming posts.
I will also address the problem of having raw data on different hierarchy levels.
DWH Managing Rule #1: The single most important prerequisite for success is a complete set of meta data
Source: The Data Warehouse Blog [link]
In my opinion, one of the very first things a DWH project manager should strive for is the definition of a complete and consistent set of meta data.
If this is done, requirements engineering, specification, documentation, and project management is nothing more than collecting meta data and assessing the completeness of the meta data set. Through priorities and processing sequences it is possible to completely define a procedural model for the DWH project.
When I speak of meta data, I do not only mean the more or less technical data, which describes dimensions and facts, but also data, which describes the warehouse process (ETL), and, most important, “political” data like target groups, stakeholders, team members, and other important people.
To get the most out of the meta data and to alleviate the collecting and administration of the meta data set, I frequently use a relational database. That allows me to generate a GUI for entering data and a number of different reports. Plus, this database can be used as a central repository for each member of the project team. For the project manager it can be of great help, if it contains typical project information like target date, status, estimated effort, remaining effort, responsibilities, etc. for the relevant entities.
A big advantage, which is based on the completeness of the meta data set, is, that certain pitfalls and showstoppers can be identified at a very early stage of the project.
Here is an example from one of my projects: I’m always especially paranoid with historical variability like slowly moving dimensions (which often turn out to be rapidly changing dimensions). Hence there are a number of attributes in my meta data model, which describe SMDs. In the (meta data based) process of specification and requirements engineering I asked the client about the historical variability of the product hierarchy. The people I asked were very amazed and apparently, nobody in the company had ever though about it. The question was: What happens with historical data when the product hierarchy changes? Has the change to be applied to the historical data (especially aggregated data)? Through the procedural model implied by the meta data we were able to address the implications of the historical variability at a very early stage in the project and we could force the client’s management to make a reliable decision. Very often, these kinds of aspects finally occur when the BI system is already in production, jeopardizing the success of the entire project.
In one of my next posts, I’m going to describe the meta data model in more detail by identifying the different sections of the model and describing the attributes, which make up the different meta data entities .
What are the most popular MDX functions in AS2005
Source: Mosha Pasumansky [link]
Almost two years ago, in January 2005, I have done a little research on the popularity of MDX functions. The methodology was to search the Analysis Services newsgroup for the names of MDX functions. Since some of the MDX function names are also very common English words - they were excluded from the statistics. But there were enough function left to do interesting analysis. Back in January 2005 most of the posts on the newsgroup were about Analysis Services 2000. About June 2005, Microsoft switched from public newsgroups to MSDN forums as a community support system. MSDN forums have many advantages compared to newsgroups - it is possible to mark posts as “Answers”, subscribe to email alerts for the interesting threads etc. The migration started slow, but by now activity on the Analysis Services MSDN Forum seem to surpass activity on the microsoft.public.sqlserver.olap newsgroup (as judged by number of new topics per day). Anyway, since MSDN forums started around the time when Analysis Services 2005 was only couple of CTP’s away from shipment, vast majority of threads on the Analysis Services MSDN Forum are about AS2005. So I decided it would be interesting to compare the statistics about 10 most frequently cited MDX functions in AS2005 vs. AS2000. Below are results:
|
|
There are several interesting points to make here.
- The distribution of functions seems to be less skewed in AS2005. Top 10 functions made 68% of all in AS2000, but only 56% of all in AS2005. I don’t have good explanation why is that. It’s true that AS2005 has few more MDX functions, but most of them had very little impact (KPIValue mentioned only 3 times for example), with NonEmpty being the only exception (and I didn’t count neither Exists nor EXITSING - since both of them are common English words).
- CurrentMember is still a clear and undisputed leader, but it lost half of its market share from 30% to 15%. Is it because people follow mine and others advice to omit redundant CurrentMember in order to improve performance. Or, perhaps, they realize that redundant CurrentMember breaks multiselects in WHERE clause as described here.
- NonEmptyCrossJoin is kicked out of top 10 !!! And newcomer NonEmpty gets bigger market share - 3.34% vs 2.98% (although a lower rank in the table). I like to think that people realize that NonEmptyCrossJoin is evil, and NonEmpty is a good, performant replacement when applicable.
- YTD rose from position 8 to position 3 in the table, and gained market share from 2.27% up to 6.43%. YTD is almost exclusively used in running sum calculations, i.e. SUM(YTD()), and everybody should be happy that exactly this kind of calculations is now much much faster in SP2!
- IIF dropped both on position from #2 to #4 and in market share from 10.69% to 6.19%. I don’t hide that I don’t like IIF function, and I always tell people to rewrite their MDX not to use IIF. The most common technique is to use SCOPE instead. SCOPE is not MDX function, but I counted it as well, and if it was MDX function it would’ve made the table on the 5th place with 5.5% market share. So could it be that people realize that IIF should be replaced with proper SCOPE ? (And it’s not only me writing about it in the blog here, here and here - the official Analysis Services Performance Guide talks about it as a first thing in the “Writing Efficient MDX” chapter!).
- Descendants lost some market share, and slipped from position #3 to position #5. This can be probably attributed to the fact, that in UDM world Descendants is not as essential. For example, Descendants(Geography.USA, Geography.City, SELF) can be rewritten using attribute hierarchies as Exists(City.City.MEMBERS, Country.USA). Or inside calculations SCOPEs, the explicit use of Exists is not even needed because of autoexists, i.e. SCOPE (City.City.MEMBERS, Country.USA) is preferable to SCOPE (Descendants(Geography.USA, Geography.City, SELF))
- TopCount remains popular enough to just make the Top 10
Again, everybody will be delighted that SP2 fixes problems with TopCount and WHERE clause when they use attributes from the same dimension. - IsEmpty is gone from the table (it is now #16 down from #9). It got replaced with NonEmpty function, which is a better way to handle Filter(…, IsEmpty(…)).
- Avg got replaced by Max on the 7th place. I have no good explanation for that. Perhaps usage of Avg declined due to AverageOfChildren semiadditive measure ?
I am sure that some of the speculations above are just my imagination, but the overall trend is good. The use of functions which I labeled as “bad/deprecated” in AS2005 has declined, and the use of good functions has increased. The OLAP world is becoming a better place
erp4it: DBMS configuration management
Source: OLAP/BI/IM stuff [link]
erp4it: DBMS configuration management: “I am in search of information specifically related to database ‘instance’ configuration management. Any information/opinions would be great. To be precise, what I mean by ‘instance’ would be the configuration of, say, Oracle 10gr2 or Sybase ASE 15 that is mounted on a host OS (e.g. on HP-UX 11.11). Let me explain: What we are being solicited for by our large user base is management (e.g. large scale global management) of the database software that is mounted on specific types of ‘boxes’. “
Atlassian: A company I hold in high esteem
Source: bayon blog [link]
Atlassian, makers of Jira and Confluence, is an exceptional company in my opinion.
- They make a solid product that gives users the “I kick ass” feeling.
- They understand the benefit of making it easy to BUY software, instead of SELLING software to people. You can eval their product and purchase it on your CC and expense it. No stiff suits and long high touch sales cycles.
- They’re open. They have open APIs, plugins, modules, web services, work with about any app server/db, have transparent discussions about product/features/bugs in public.
- They’re “open source — eee” without an open source license. They are all the great things about community, openness, flexibility, and choice; they are NOT themselves open source but contribute symbiotically with code and free licenses.
- They’re HONEST about their open source stance: We contribute to core projects, give our product for free, but we are NOT open source ourselves. “It’s really quite simple, Open Source (capital O, capital S) means that the software has an OSI approved license. If it doesn’t, don’t use the term.”
- Young, smart, energetic, smart, focused, did I mention smart?
I’d been waiting to publicly describe my regards for this company but this just put me over the edge:
any Atlassian employee can spend up to 6 paid work days a year working for non-profits or charities of their choice.
he 20% Google employees get for any project, but WOW. What a committment to values beyond the bits. Really shows me that the “community” that Jira believes in is more than just lip service for software sales. They believe it.
Kudos. I look forward to suggesting to everyone I know to purchase your product.
For those looking for work
Source: Pete-s random notes [link]
I read from Gartner Dataquest that the EMEA BI market grew 15.5% in 2005 and with compound growth a fraction under 10% predicted for the rest of the decade it looks a good area to work in.
Amazon Web Services Success Stories
Source: OLAP/BI/IM stuff [link]
Amazon Web Services Success Stories: “We have written before about the innovative Amazon Web Services Platform. This stack was officially announced by Amazon CEO Jeff Bezos during the recent Web 2.0 summit and is now considered part of the core business strategy for Amazon. While analysts, competitors and Wall Street are pondering what to make of this move from a business sense, in this post we look at who is utilizing Amazon Web Services - and how. This post is based on personal communication with those people, along with the set of success stories available on the Amazon Web Services site.”
Email to OSI license-discuss re: Generic Attribution Provision
Source: bayon blog [link]
From me, to Ross and license-discuss:
Socialtext which wishes to find a resolution for the attribution issuethrough the proposal of a Generic Attribution Provision. A copy ofthe following message is available in HTML format here:https://www.socialtext.net/stoss/index.cgi?attribution_memo I look forward to the conversation,
Ross, as I commented on a ZDNet thread, you’ve earned my respect (not that it matters) by bringing your license to OSI and having a real discussion about UI attribution. I’m one of the critics of UI attribution licenses, but I’m glad someone brought it to place where forced UI attribution can be vetted to OSD in a reasonable manner. I do hope you receive the criticism of this provision in that light.
needs than Linux. These application products could be “lost” in thelarger distributions. The obligations imposed by the attribution
I’m uncertain why copyleft licenses don’t meet the needs for “avoiding larger distrib hiding” without compensation. Where does a license like the GPL not suit your needs? If people are hiding it in their distributions will GPL (even with FOSS exception) not meet your needs?
provision are very similar to the reproduction of legal notices whichare found in virtually all open source licenses.
Attribution in documentation, source, and a splash screen at startup (to date approvals) are not “very similar” to a required use of a trademark that is not owned by the person fulfilling their obligations. Redhat has sent letters to downstream vendors clearing stating their rights to not have their trademarks used. They are quite different (more below).
However, we understand that attribution may cause problems for OSI,particularly since different companies may have different attributionnotices and may use different “base” licenses (all recent attributionagreements are based on the MPL).. Socialtext would like to suggestthat OSI consider an “attribution” provision which can be used for any“modifiable” license.
I think if an attribution provision, limited to be reasonable and in line with previous approvals (docs, splash page, etc) would not be detrimental to the open source effect. I think having it as a general provision would be beneficial in license proliferation and remove objections from these companies just writing their own licenses willy nilly.
Generic Attribution Provision Redistributions of the [original code] in binary form or source codeform, must ensure that each time the resulting executable program, adisplay of the same size as found in the [original code] released bythe original licensor (e.g., splash screen or banner text) of theoriginal licensor’s attribution information, which includes: (a) Company Name(b) Logo (if any) and(c) URLs
IMHO this allows exactly the problem with the current Mozilla Exhibit B going around. This provision allows a “blank check” because HOW the original code attributes determines how prominent or onerous it is. I’m not a player in this (ie, OSI board) but I’ve suggested alternative language below that would limit to more common and appropriate attribution locations. This is similar to what MOST people do currently (in Documentation, About Page if present, etc). Common places to acknowledge the work that contribute to the whole application. Remember the open source effect is TRYING to make multiple open source projects open for reuse in subsequent applications NOT limit those uses.
POSITION STATEMENT 1. Consistent with OSD. Attribution is merely a form of notice whichis consistent with Section 4, the Integrity of the Author’s SourceCode, of the Open Source Definition. Virtually every OSI approvedlicense requires the inclusion of copyright and other legal notices(and frequently more elaborate information, see below). Theattribution requirement is similar to this notice requirement.
ilar in that they are both attribution, but they are quite dissimilar in their burden. Let’s be clear: there is a BIG difference between a note in a splash screen, document and a badge (amount many) on EACH UI screen.
Consider other attribution circumstances: The artist that samples (perhaps even with payment) a Beatles song attributes the original copyright and work in a page inside the CD. Or the inside CD cover, or … The attribution clause proposed this would require the new artist to place Powered by the Beatles(tm) on EVERY customer facing “screen” such as concert posters, CD covers, websites, etc.
Rationale: Encouraging lots of improvement is a good thing, but usershave a right to know who is responsible for the software they areusing. Authors and maintainers have reciprocal right to know what
e’re in perfect agreement that no one should try and hide the source of the source, but should users be required to know the hundred or so names of developers that contributed to the Apache web server when they go to a website? Should someone claim they wrote it? No. Should someone remove license copyright notices? No.
they’re being asked to support and protect their reputations.Accordingly, an open-source license must guarantee that source bereadily available, but may require that it be distributed as pristinebase sources plus patches. In this way, “unofficial” changes can bemade available but readily distinguished from the base source.
nk the source is an issue, is it? I’m not sure anyone is saying that there should be no attribution in source files.
2. Already Approved. OSI has approved several licenses which includeattribution, Attribution Assurance License, Open Source License andthe Adaptive Public License, as consistent with the Open SourceDefinition.
I wasn’t around for the rationale but I’m guessing they were limited enough in scope, as to not limit rights but rather provide modest attribution in appropriate “give credit where credit is due” places. I don’t think they intended approving attribution to dictate a visual logo (advertisement) on EVERY UI screen. Perhaps others can revisit this for the benefit of everyone?
2. Redistributions of the Code in binary form must be accompanied bythis GPG-signed text in any documentation and, each time the resultingexecutable program or a program dependent thereon is launched, aprominent display (e.g., splash screen or banner text) of the Author’sattribution information, which includes:(a) Name (”AUTHOR”),(b) Professional identification (”PROFESSIONAL IDENTIFICATION”), and(c) URL (”URL”).
Notice it says launched. ie, notice of copyright NOT a UI banner on each screen.
3. Not a Burdensome Requirement. Some individuals have expressedconcern that attribution requirements will result in products wherethe screens are filled with logos. Yet, by their nature, licenses withattribution will only permit the original licensor to include its logosince the license cannot be amended by sublicensors. Many open source
at multiple UI attribution license can not be combined?
If they can be combined the UI filled with badges could be VERY real if enough people use badgeware. Certainly many of the badgeware applications would look different if the open source projects they used had the same requirement. I counted 18 for Sugar; a testament to how good open source projects can benefit everyone. How many OSI approve licenses does SocialText use? If you send me the list of the projects SocialText used I’m happy to create a mockup of YOUR UI if we live in a badgeware world. You can see what the commercial viability of your software would be if those that came before believe they deserved this same right. Do you have an splash screen or an about page for every END USER of SocialText listing all the COPYRIGHT holders you’ve used? Remember open source projects often have MULTIPLE Copyright holders that might have different attribution clauses. Joe Jimmy from Jersey and Susy Soody from Sarasota could probably have their picture included.
If they can not be combined (can you explain what you mean if I’ve misunderstood) then that seems to be in conflict with OSD; and VERY much against the open source effect. Open Source strives to make it easy for people to reuse code; increases quality and decreases duplicate work. Not being able to combine two licenses that are both OSI approved would be, well, very odd.
4. Applications. The needs of “application” open source softwareare different from the more traditional “operating system” open sourcesoftware. Application software is frequently distributed by thirdparties with other products without any notice to end users; this
o say that infrastructure OS is more commonly distributed then applications.
By different are you really saying better? Somehow because you are writing code that is geared more towards end users it is worth MORE protection and recognition then everyone else? For instance, the ~7500 lines of code that make “grep” that are used ubiquitously and with great utility; it receives no such place on the UI screen for a web application that is a few simple lines of perl code. In all the code that comes together to display a web page or processing a record or do any number of application things why is the 5% these application companies (some php code) special? Each took $$ and time to make, but somehow an application is special?
I don’t understand… Please do share why widely distributed code built using real developers and real costs ($$) are worth less than what application companies are writing.
possible under open source licenses without attribution. For example,the incorporation of application programs anonymously intodistributions by large companies could destroy the market for opensource application software.
Again: consider using a copyleft license if you want to prevent this. This force you claim can destroy the market can be mitigated by a license which makes it difficult, if not impossible to embed. Why does GPL not meet this need?
5. Part of a Larger Problem. Some individuals have expressedconcern that the attribution licenses are not approved by OSI. Yet,many other modifications of open source licenses have not beenapproved by OSI, such as FOSS and Affero. OSI should address theentire problem or can be accused of selective enforcement.
Preaching to the choir here. The OSI reviews license. They won’t review license that aren’t submitted by original authors. SO… There’s no selective enforcement. It’s a community thing. People like me applying pressure to companies claiming to be open source but have not passed the de facto vetting to the definition of open source.
6. Community Acceptance. These licenses are used by Socialtext,Zimbra, Alfresco, Qlusters and SugarCRM. Yet their communities havenot expressed objections to this requirement. Many of these companiesare building business models which include distribution by thirdparties so the distributors do not have a problem with this approach.
Hey, Shareware with no expiration date is “beer” software. I’m not sure you’d hear people complaining about receiving free to use software. Oracle gives away a version of it’s database; the Oracle community isn’t complaining about that. Calling it open source, and the connotation that it has freedoms in USE is different. Microsoft Communities will sing the laurels of their benefactor. Doesn’t mean it’s accepted by an open source community because people like the software you let them run for free.
Are any of these downstream distributors able to legally use your trademarks in the distribution without some understanding or agreement from you? All is fine and dandy when blessed by parties with a business arrangement. Can a competitor use your code and trademark legally you have no recourse to stop them?
7. Consistent with Creative Commons. Creative Commons includes“attribution” as one of the key decisions that need to be addressed inusing their licenses.
Creative Commons is applied in proper places. ie, a note in a post etc. If I quote from a creative commons piece I have to make a note of it somewhere. I don’t have to place their logo on my entire website.
8. Not BSD Advertising Requirement. An attribution requirement isnot similar to the “advertising” requirement. It does not impose“vague” requirements to mention the Berkeley Software Distribution inundefined “advertising”. On the contrary, it is very specific and easyto understand and comply with.
Well, I think the original Exhibit B is actually MORE specific than the proposed Attribution Provision. As onerous as it may be, it at least spells out the pixel size so it’s CLEAR the impact that each combined license will have. This requirement leaves it up to the original author how onerous the provision is (pixel size, watermarks, etc).
I’d like to suggest an alternative attribution provision that provides the same “attribution to the user” but is much less onerous. I’m no attorney so I’m perfectly willing to let it be wordsmithed by anyone on this list. I disagree with it, but suggest it as a useful compromise and for discussion. I’m a strong critic of UI attribution and something like the following would remove most of my objections (not that my objections are any more or less valuable than anyone elses):
Redistributions of the original code in binary form or source code form, must ensure that each time the resulting executable program, a display of items (a),(b),(c) released by the original licensor on a splash screen or about page of the original licensor’s attribution information if such a splash screen or about page is present in redistribution, which includes:
(a) Company Name
(b) Logo (if any) and
(c) URL
Original Licensor grants limited use of Trademark and Logo as necessary to fulfill obligations of this provision.
Note: the “diffs” are clearer in HTML on my blog should anyone wish to review it there.
http://www.nicholasgoodman.com/bt/blog/2006/11/27/compromise-attribution-rider-on-any-osi-license/
Kind Regards,
Nick
Habla el mayor experto sobre OLAP: Nigel Pendse
Source: Todo BI: Business Intelligence, Data Warehouse, CRM y mucho mas... [link]
Como ya os hemos comentado en algunos comentarios anteriores (aqui, aqui y aqui), Nigel Pendse es el mayor experto en OLAP desde hace unos cuantos años a través de su imprescindible The OLAP Report.
Por ello, la entrevista que os mostramos a continuación realizada por los amigos de IS Portal, con los que colaboramos incluyendo algunos de nuestros comentarios, merece toda nuestra atención. El mercado OLAP, que durante un tiempo se pensó que corría peligro de desaparecer parece resurgir con nuevas fuerzas: inclusión en motores relacionales (Oracle, MSFT..), versiones Open Source (Mondrian, Palo), nuevos cubos de acceso excel, etc…
Nigel, usted es conocido en todo el mundo como uno de los analistas más importantes de OLAP, Business Intelligence y también Corporate Performance Management (CPM). ¿Podría describirnos brevemente en que consiste su trabajo?
La parte principal de mi trabajo es el “OLAP Report”, que contiene una suscripción gratuita la cuál está disponible en la red continuamente. También hacemos un “OLAP Survey” anual, que consiste en un análisis cuantitativo de las experiencias de un gran número de clientes seleccionados de diferentes campos. Lo que los hace diferentes es que el “OLAP Report” se basa ante todo en opiniones, mientras que el “OLAP Survey” centra su opinión en miles de páginas webs reales de los clientes.
¿Cuantos clientes o usuarios normalmente participan en su encuesta?
En este año hemos tenido 2.100 usuarios reales en el “OLAP Survey”. La muestra total ha ascendido a 5.000, pero algunos de estos no eran usuarios o no nos suministraban sus datos. Considero que no hay una encuesta similar que reúna una base de datos tan grande. Además, este año como muchas de las preguntas se habían solicitado en los años anteriores se ha añadido una búsqueda por tendencias.
El mercado de BI tiene tantos vendedores y productos. ¿Como consigue crear ciertas comparaciones?
En el “OLAP Survey” comparamos beneficios económicos en lugar de sus características, y esto es algo que todos los productos deberían tener en común. Basamos el análisis en ocho beneficios económicos, entre los cuales se incluye el economizar su capital, reducir gastos, aumentar los ingresos y mejorar el reporting. De esta manera, nos permite comparar productos que son incomparables en cuanto a sus características específicas.

Justo ahora, usted ha terminado su investigación sobre el “OLAP Survey 5″. ¿Se ha encontrado con algunas nuevas tendencias que le hayan llamado su atención?
Una cosa se revela más fuerte que nunca en este año, y es una relación notable entre el éxito económico y las consultas del proyecto. Siempre he sabido que esto ha sido algo positivo, pero no la más importante. Lo que más lento resulta ser son las consultas, lo más complicado es mantener la relación directo a través de los beneficios reportados. Esto por ejemplo, es algo que nunca me hubiera imaginado. Las consultas resultan ser el problema más severo de todos. Es algo que esta allí y no se puede ignorar. Por otra parte, las políticas de empresa, que resultaba un problema mayor hace cuatro o cinco años, sigue manteniéndose como algo relevante, pero actualmente ya no es el problema más importante. Otro punto a destacar resulta ser la cualidad de los datos pero que también se ha mejorado de forma considerable. También cabe destacar que el número de problemas entre personas - relaciones permanece asimismo como algo importante. En general, los problemas más frecuentes relacionados con personas son un poco más serios que los problemas del producto y los problemas del producto son peores que los problemas de sus datos.
¿Pero estos problemas no podrían solucionarse a través de la tecnología?
Generalmente los problemas son relacionados con el rendimiento, que típicamente se da en el momento de seleccionar el producto ideal. Seleccionando un producto MOLAP se da un rendimiento mejor que un producto de ROLAP, por eso resulta una decisión fácil. Si tiene una solución de MOLAP o una de ROLAP, las dos que lo hacen, se elije el MOLAP porque va a ser más rápido su implementación y la consulta va a ser mejor y esto le permite obtener una retribución inmediata.
En su trabajo, ¿Cuántos compañías ha visto que han normalizado sus aplicaciones de BI y CPM con éxito con la ayuda de un único vendedor?
Creo que realizar una estandarización forzada es un error - y nunca dura. Incluso si su decisión para normalizar fuese la correcta, las cosas siempre varían. Quizás un nuevo producto sea lanzado por un vendedor distinto. Quizás el vendedor que realiza el proceso de estandarización puede ser absorbido por otro y el nuevo propietarios empiece a equivocarse, como ocurre en la mayoría de los casos. La estandarización podría parecer buena en sus inicios pero no vale pena intentarlo para hacerlo cumplir en estas condiciones.
¿Usted cree que el OLAP y BI son temas que se pueden relacionar con IT y profesionales de negocios…o los dos?
Yo creo que los dos. Profesionales de negocios no pueden implementar una solución empresarial por sí sola. Asimismo, IT no puede implementar una solución empresarial de manera individual de forma exitosa. Generalmente creo que la idea de un equipo colectivo que es dirigido a través de los negocios depende también del tipo de la aplicación que utilice. El reporte de ventas a gran escala de un almacén de datos podría ser IT ante todo. Una aplicación financiera como planificación, presupuestación, informe financiero o consolidación, sin embargo, debería ser principalmente para los usuarios utilizando un soporte de IT.
Analistas y consultorías pueden ayudar a las compañías en su búsqueda por soluciones de BI. ¿En qué difieren estos enfoques?
Los consultores deberían pasar el tiempo trabajando con sus clientes para descubrir sus necesidades específicas. Esta es más una relación personal que puede durar días, semanas o quizás incluso meses. Los analistas por su parte ofrecen un consejo general, porque tienen un sentimiento más general del mercado. Los analistas también se basan en su trabajo pero normalmente este es una información general de más alto nivel. La mayoría de analistas diferencia claramente la terminología OLAP, BI y CPM.
¿Usted cree que el usuario medio entiende la diferencia?
¡Yo no entiendo la diferencia, y dudo mucho que algún usuario pueda! Para mí, toda esta terminología refleja que los vendedores lo utilizan de forma general. Performance Management es el término menos definido de toda la terminología. Independientemente de quién es el vendedor, le incluyen habilidades diferentes, pero fundamentalmente todas son basadas en la tecnología de OLAP. Y OLAP es una tecnología que va a proveer soluciones distintas, dependientemente de sus herramientas en uso. Si lo llama Performance Management, usted va tener presupuestación y planificación y “scorecards”, pero la tecnología de abajo es la de OLAP. Por eso me quedo con el término bien definido: OLAP, porque por lo menos sé que lo significa.
¿Dónde piensa que se posicionará el mercado en los próximos cinco años? ¿considera que el mercado continuará consolidándose o sólo se mantendrá para unos cuantos?
Seguramente habrá más adquisiciones. Oracle ha afirmado claramente que va a comprar más compañías para entrar en ámbitos nuevos o lo justo para consolidar su control del mercado. Las empresas de consultoría podrían estar implicadas también, creo sin embargo que hay demasiados vendedores en los sectores de mayor demanda. Y si hechas un vistazo a otras industrias su número está decreciendo. Ahora Microsoft desea obviamente continuar en compañía de tecnologías base pero todavía no han demostrado ninguna capacidad para desarrollar aplicaciones exitosas para el mercado. Esto es una realidad que podría cambiar, pero mientras permanezca en esta línea de trabajo Oracle y Microsoft no podrán obtener la supremacía que yo había pronosticado. En consecuencia pienso que es todavía un mercado rentable para vendedores independientes de BI, pero no para 50, quizás por 3, 4, 5, 6, pero no para una docena de ellos.
Tags: Destacado
First 100 Million Rows done in the “cloud”
Source: bayon blog [link]
My good friend, Matt Casters, posted his results from what we believe to be the first 100 Million Rows of data processed by an ETL tool in the new cloud computing paradigm. Matt Casters ran a simple 100 Million rows through Kettle on Amazon EC2.
I should really do a write up or review of EC2. I’m LOVIN’ it and others I’ve introduced to it are LOVIN’ it too! I just need some spare time (ha ha ha) to write it up.
Arrived in Zagreb
Source: Mark Rittman's Oracle Weblog [link]
I flew in this afternoon to Zagreb airport, and it appears I missed the good weather of a week or so ago to instead arrive during the middle of a particularly foggy patch. I’m staying in the Hotel Antunovic just outside the main part of the city, and this is also where the seminar is being held, so I don’t have to worry about taxis or getting to the venue in the morning.
I arrived at the hotel around 4.30pm, and for the first time, I had a bit of a welcoming committee, with the local Oracle Education manager and the hotel events organiser waiting for me after checkout. I don’t normally get someone meeting me at the hotel, and so I was a bit under-dressed in just jeans and a jumper, but I got a chance to look at the event venue and chat through the specifics of the Croatian market.
I’m just working through adding a bit of content around BI and SOA into the seminar slides, and also just finishing off the first chapter of the book I’ve got planned. I can’t say what it is yet, but the contracts have all been signed and sent off and hopefully, in a couple of weeks’ time, I’ll be able to announce what it’s all about. Until then, it’s off to the revolving restaurant on the eighth floor and then hopefully an early night.
Software Quality Reports for Jira 0.8.25
Source: bayon blog [link]
We’ve just released a beta cut of the Software Quality Solution for Jira. This project, sponsored by Pentaho is a complete BI solution that reports on Jira issue data that runs on top of Pentaho.
Software Quality Reports for Jira is an analytic application; it provides classic slicing and dicing of issue data, along with helpful trend lines, custom reports, etc. Jira does a GREAT job at operational reporting (what is assigned to me) but isn’t setup to do adhoc, complex, time series and historical reporting. Things such as bug burndown, average days to close by product and priority, trend lines on bug balances, etc.
Here are some graphs that come “out” of the solution using the web based end user tool:



NOTE: These are reports built from the Jira installation Pentaho uses to track issues for our products, http://jira.pentaho.org:8080 a couple of days back.
This beta release is the first public release of the solution. We’ve had a customer using the solution, and we’ve been using it against our Jira data now for several months. In fact, we actually wrote the Jira build PRIOR to the Bugzilla build.
At this point, the primary goal is to collect feedback and set direction to make it more useful.
- What do you think? Is it useful?
- Is it worth the additional installation (Pentaho server) for reporting above and beyond reports in Jira?
- What do you want to see next? Dashboards, more reports, additional attributes on the Person dimension, etc?
Feedback here is fine, or email through to me ngoodman __ pentaho — ORG.
Hope you find it useful!
Business Intelligence Pure Plays: A stock rebound!
Source: Data Doghouse - performance management, business intelligence, and data warehousing [link]
The top BI pure-play vendors - Business Objects (BOBJ), Cognos (COGN) and Hyperion Solutions (HYSL) - all experienced pronounced declines in their stocks prices this summer.
My blog post Business Objects: Ouch! preceded the 52 week low price point for shares of BOBJ and COGN, as well as the overall software index represented by the iShares IGV ETF. Since that time these stocks have rallied significantly with two out of three creating new 52-week highs in the last four weeks.
The following chart illustrates the three BI pure play stocks, the IGV ETF and the NASDAQ for the year-to-date (YTD).

In addition to the chart above, the following table lists these stocks in relation to their 52 week high and low:
What does the decline and rise mean in relation to the BI market and the individual BI companies? Will this impact software market consolidation?
Why did these stocks drop in the first place over the summer? The stock decline is based on a few factors.
First, many people became cautious of the overall stock market, and in particular high tech during the summer. Concerns over oil prices, housing market, capital spending, the Iraq war and the US mid-term elections put a damper on the rising stock market.
Second, expectations regarding the BI market from a software licensing perspective became more realistic in my opinion. BI growth is solid and continues to be so BUT the growth rates, especially in terms of software licenses, is generally slowing down. A company’s price-to-earnings (P/E) ratio is a reflection of people’s expectation for growth and some people felt that they had grown too high for the BI pure-plays.
It’s a classic example of how as a company and market get larger in term of sales the high growth rates become more subdued. As that happens a company’s P/E gets lower and often its stock price goes lower to reflect the slower growth. As the company’s sales and earnings rise then the stock price rises again.
Finally, each of the BI pure-plays are going through a product upgrade cycle that, as we have discussed earlier, is going slower than they and their investors would like. The slower software upgrade cycle is not a reflection of software quality or functionality but rather a more prudent approach by customers in their approach to spending their IT budget and resources.
Why the significant rise?
First, sometimes all boats rise together (in this case high tech, software and other industries) and that is a partial expectation. The economy does not appear to be collapsing for now (although Wal-Mart sales drove the overall market and BI stocks down yesterday!) and expectations are that business capital spending will continue at it’s projected single digital increases for this year and 2007. That’s not a boom but it certainly is not a bust (if the forecasts are correct.)
Second, BI and performance management (PM) projects appear to be at the top of IT priorities for this and next year. And I would suggest this number may be understated since many business initiatives that have BI/PM components as necessary ingredients are not being counted in the BI category.
An important qualifier is that regardless of how high a priority a BI project is, if the economy slips into recession then these projects can be postponed just many other IT projects.
Finally, the latest quarterly reports and financial analysts’ health checks of the BI pure-plays are positive resulting these firms reaching new 52-week highs.
Has anything changed in the BI market since the summer? Not really. The demand for BI across industry segments and company sizes remain strong. The demand is based on business, competitive and regulatory pressures rather than being technology-driven. many data warehouse and BI projects in the past were built in the "If we built it, they will come" principle but that is generally not the case now.
Finally, a little crystal ball predictions on mergers and acquisitions: Although the best time. i.e. cheapest time, to buy the BI pure-plays from a stock price perspective was clearly during the summer, their market caps are still within the price range of the large software or high tech companies, as well as by private equity firms. All the top BI pure-plays are attractive acquisitions in terms of sales, profits, growth rates, customers, technology and people. Oracle, Microsoft, IBM and SAP are all possible buyers. And lately I have been thinking that HP should also consider acquiring BI and data warehouse companies. I have NOT heard any rumors regarding HP, this is just a idea that I believe would be a strategic move for HP.
High volume and low performance - what to do?
Source: Blog: Dan E. Linstedt [link]
If you’re like the rest of the world these days you’ve got an ever growing data set, and at the same time an ever shrinking processing window. This is not something you want to treat lightly. In most cases, you are also experiencing severe performance problems and don’t know how to deal with it, or haven’t been able to solve these issues. Well, there are ways and means in which performance can be improved - I’ve been teaching, and consulting on performance of VLDW integration systems for 10+ years, there are techniques and manners in which your performance can improve. The catch? You have to be willing to swallow the blue pill (from the movie: The Matrix). Let’s just see how far done the rabbit hole goes…
