forget-me-not panties : with sensatech technology
Source: OLAP/BI/IM stuff [link]
forget-me-not panties : with sensatech technology: “These panties will monitor the location of your daughter, wife or girlfriend 24 hours a day, and can even monitor their heart rate and body temperature.
Based on pioneering research developed by the U.S. military at DARPA (Defense Advanced Research Projects Agency), we have brought this revolutionary technology, previously only available to the military, to you! “
OracleAppsBlog - A day in the life of an Oracle Applications Consultant
Source: OLAP/BI/IM stuff [link]
Surrogate key generation in SSIS
In the last few days I tested several ways to generate surrogate keys when loading a dimension into a data mart.
The classical approach with SQL Server is to rely on a INT IDENTITY column. It works fine, but you don’t know what surrogate key has been generated for a given application key until you read the row from the table. Not a big issue with DTS, because you can’t do very much (in a way faster than SQL itself) even if you know this number, but with SSIS if you want to push performances to the limit, you need to load a fact table within the dimension table(s) without reading what you just writed into a table.
The first step in this direction is to generate surrogate keys inside a SSIS package. If you do a complete dimension load at each run, you can solve the whole problem using a scripting component inside a data flow task. Read this article from Jamie Thomson to see how you can implement it. But life is never as easy as you could desire, so when you have to incrementally load a dimension table then you need to give to this script the right initial value (the last ID currently used instead of 0).
After several approaches discussed on the newsgroup, I came to the conclusion that the most affordable way to implement it is a not-so-desirable one:
- define a Execute SQL Task which execute a SELECT COALESCE( MAX( ID_Dimension ), 0 ) AS LastID FROM Dimension and put the single row ResultSet into a variable (let’s say we use User::ID_Dimension)
- define a Data Flow Task in the same way Jamie did in his article
- put the User::ID_Dimension variable into the ReadOnly Variables collection in the Script Component
- define PreExecute method inside the script to initialize the value for counter you will use to enumerate new surrogate keys (see code below)
Public
NT>Class ScriptMain
ONT> UserComponent
> counter As Int32
T> Overrides Sub Input_ProcessInputRow(ByVal Row As InputBuffer)
counter = counter + 1
Row.IDDimension = counter
FONT> Sub
; T> Overrides Sub PreExecute()
T>.PreExecute()
counter = Variables.IDDimension
> Sub
End
NT>Class
That’s it. You can download the sample package here.
Now, don’t pretend this solution to be faster than a SQL Server INT IDENTITY column… by now, it isn’t, but we can’t judge this with current CTPs. But don’t desperate: it could be better in the future and, more important, if you really need to use the generated keys in other transformations you could already take advantage from this architecture.
Why it’s a not-so-desirable way? Well, I don’t like to rely on a variable defined in the control flow while all the dimension-load logic is inside a data flow task. Imagine how ugly could become a SSIS package if you load 20 or 30 dimensions; of course you can group two or more tasks together, but I consider this only as a work-around. There is another issue if you would like to update your somewhere-in-the-package variable with the last generated surrogate key, because you can’t use the same variable as ReadOnly and ReadWrite (in the Script Transform) and you can’t read a variable passed as ReadWrite outside the PostExecute method (I’m right: you can only read/write it inside PreExecute overridden method, you can read it in the PreExecute methos - as in my sample - only if variable is passed as ReadOnly).
I thought other ways to find a better solution (most of them has been suggested by kindly people on the newsgroup):
- Script transformation with two inputs: not supported (I would like to pass the real data source as Input1 and the last used ID as Input2)
- Cross Join between two inputs (defined as the previous idea)
- Custom data flow task (with one input and a property with table/column to look for the last ID) - it needs to be written
other possible variations on those themes. But I have to say that SSIS could be improved very much to make life easier for the BI developer: a real “definitive” surrogate key generation task could be the right answer to this kind of needs.
¿Está en crisis el software empresarial?

Esto es lo que dice este artículo en el IT Journal a propósito de la última caída de ventas de Siebel de un 40% en el último trimestre. Comenta que uno de los competidores que está erosionando las ventas de Siebel es salesforce.com, que aunque gartner (lo podéis leer en todoBI) no lo ve apropiado para grandes empresas, es configurable y ya hay compañías especializadas en hacer personalizaciones mas complejas. Pero como en internet los negocios vuelan, también comenta el artículo que a salesforce le quita ventas SugarCRM desarrollado en código abierto, al igual que CentricCRM. Yo modestamente creo que la crisis de Siebel no tiene que ver tanto con salesforce.com y otros sino más bien con que es un software de nicho, que ha tenido un crecimiento explosivo y habiendose llevado muchos grandes proyectos, ahora le cuesta seguir el ritmo porque ya no quedan grandes empresas dispuestas a pagar esos precios por unas funcionalidades menos imprescindibles que por ejemplo los sistemas contables que vende SAP. Pero Siebel seguirá vendiendo a empresas medianas y grandes. La aparición de software barato, o gratis como freeCRM.com que se financia con banners; lo que permite es que pequeñas empresas e incluso particulares, puedan disfrutar de un crm, o de una web corporativa con sus estadísticas, un servicio de correo, una suite de office..(tenéis más en la gacela).. Esto en mi opinión fomentará la aparición de emprendedores, que desde su casa con una banda ancha y un teléfono móvil, ya tendrán todo lo necesario para funcionar como empresa.
Off To Manchester…
Source: Mark Rittman's Oracle Weblog [link]
Well that’s it for the rest of the week for me. I’m off up to Manchester for
Jonathan Lewis’ "Optimising
Oracle, Performance By Design" seminar for the next three days, and as no
doubt I’ll be back in the hotel each evening trying out some of Jonathan’s ideas
on the laptop I’m definitely leaving the blog alone for the next few days. When
I’m back from the course I’ll go through some of the concepts that Jonathan
raises, and I’m particularly looking forward to an in-depth look at the
cost-based optimiser, latches and locks and parsing and optimiser costs. How sad
am I? Anyway, bye for now and I’ll be back next week.
Is RAC Suitable For Data Warehousing?
Source: Mark Rittman's Oracle Weblog [link]
Another one of our clients is looking to put some hardware in place for a new
data warehousing project. The two hardware options are:
- Buy a single, large Unix server (probably a HP box running HP-UX) with 8
CPUs - Buy two or more commodity "Lintel" boxes, with 2 or 4 processors,
running Red Hat Advanced Server, and use RAC to create a single database
instance.
But just how suitable is Real Application Clusters for data warehousing? The reason I started doubting RAC’s suitability was an article I came across entitled
"Oracle Real Application
Clusters and Data Warehouse Applications" which stated the following:
"Oracle has marketed the Real Application Clusters (RAC) option as
appropriate for all types of applications, but there is one very important
exception, Oracle data warehouses. Oracle RAC is used primarily for
continuous availability of mission-critical systems (i.e. banking
applications) and for transparent scalability of massive online transactions
systems (i.e. Amazon). However, it is not always the best architecture for
data warehouse that require large materialized view rollups because they
often require large, cohesive RAM regions (large data sorts) and
tightly-coupled CPU?s (fast parallel query) … In sum, Oracle Real
Application Clusters is a wonderful tool for mission-critical database that
must have continuous availability (using RAC with Transparent Application
Failover (TAF)) and for scalability of super-large OLTP systems, Oracle RAC
is not the best choice for data warehouse applications that require
high-speed table scan performance."
Now I know Don and Mike have spent quite a bit of time working with RAC, so I
thought it wise to look into this in a bit more detail. What I did in the end
was
post a link to the article on the
Dizwell Forum, and ask the forum participants for their experiences with RAC
and data warehousing. I won’t repeat the whole thread (which went on for three
pages, so it’s obviously an area of interest) but if you’re considering RAC for
a data warehousing project,
take a look at the thread and
Don’s
article, as they’re both useful bits of background reading. My conclusion? A
large, single server and a collection of clustered servers aren’t
interchangeable concepts; there could be issues with RAC with "edge cases", for
instance (as mentioned in the forum comments) where a parallel query needs to
run over multiple nodes, and the "large cohesive memory area" issue I still
think could be an problem if you need more memory than a single small server can
provide. The lesson I guess though is that there’s no "black and white" rule for
this sort of thing (not that the original article was suggesting this) and it’s
worth getting a bit of peer advice if you’re looking to get an answer to a
question such as this, with presumably big implications if you get it wrong.
From Web Services to SOA and Everything in Between: The Journey Begins Highlights - WebServices.Org
Source: OLAP/BI/IM stuff [link]
From Web Services to SOA and Everything in Between: The Journey Begins Highlights - WebServices.Org: “The survey found that while there are relatively few advanced Web services sites ? and even fewer bona fide SOA implementations ? managers and executives clearly understand the value proposition this new architecture has to offer. A large number of respondents are currently either deploying simple interfaces, or have new projects under development, the survey finds. Most businesses that do have efforts underway report they are seeing tangible benefits in terms of reuse, developer productivity, and cost savings.
The survey found that 35% of businesses have or plan to have a fully functioning Web services infrastructure in the near future. At this point, about 18% can be considered ?SOA-ready,? meaning they are part of an infrastructure of standardized, loosely coupled services. For most companies, the hard work still lies ahead. About 24% have simple point-to-point implementations, and 29% are still in the development stages. ”
This is a major trend with extreme potential for cost reductions, and its within a stone’s throw of reaching critical mass.
eCuaderno

No suelo mirar mucho las estadísticas, ya que no espero que un blog sobre un tema tan particular como el BI llegue a tener nunca más de 100 visitantes diarios, pero últimamente estaban un poco alteradas como la primavera, y el motivo es que un blog tan prestigioso como eCuaderno de Jose Luis Orihuela, me ha incluido entre una amplia lista de blogs recomendados en el apartado de innovación. Muchas gracias, estas cosas animan a no dejar de escribir.
“When Smart People Defend Bad Ideas”
Source: Mark Rittman's Oracle Weblog [link]
"Scott Berkun, Smart People, Bad Ideas" : "Scott Berkun, former Microsoftie,
just got
slashdotted with an essay about
"Why smart people defend
bad ideas". It’s a very interesting essay. I’m not crazy about his examples,
but his portrayal of the bad-idea-defender is dead on: "Until they come face
to face with someone who is tenacious enough to dissect their logic, and
resilient enough to endure the thinly veiled intellectual abuse they dish out
during debate (e.g. ‘You don?t really think that do you?’ or ‘Well if you knew
the
things’), they?re never forced to question their ability to defend bad ideas."
And then again: eople, or at least those whose brains have good first
gears, use their speed in thought to overpower others. They?ll jump between
assumptions quickly, throwing out jargon, bits of logic, or rules of thumb at a
rate of fire fast enough to cause most people to become rattled, and give in.
When that doesn?t work, the arrogant or the pompous will throw in some
belittlement and use whatever snide or manipulative tactics they have at their
disposal to further discourage you from dissecting their ideas."
Another good blog posting from
Adam Barr (author of
"Proudly Serving My Corporate Masters", my favourite "working at Microsoft"
book). I thought Adam’s Post, and Scott Berkun’s essay, were particularly
relevant considering the debates that have gone on regarding Oracle performance
tuning. Probably a good essay to read in conjunction with
"Crimes Against Logic".
Creating OLAP DML Formulas Using AWM Templates
Source: Mark Rittman's Oracle Weblog [link]
One of the most powerful features of the multidimensional engine behind
analytic workspaces is the ability to create formulas. Formulas, or "calculated
measures" as they’re referred to in AWM10g, are measures that are derived from
other measures. Using AWM, you can create simple formulas that reference other
measures in a cube, allowing you for example to create a "margin" measure
derived from sales and costs measures. If you’re an old Express hand though,
you’ll know that this simple type of formulas is just the tip of the iceberg,
and what you often used to end up doing was creating for example a three
dimensional formula based on measures from four and five dimensional variables,
rolling up unneeded dimensions and pulling in variables held in what would now
be referred to as "cubes".
To take an example, say that you had a table of branches:
SQL> select * from branches;
BRANCH_ID BRANCH_DESC REGION_ID REGION_DESC TOTAL_BRANCHES_ID TOTAL_BRANCHES_
---------- --------------- ---------- --------------- ----------------- ---------------
1 Brighton 10 South-East 100 All Branches
2 Worthing 10 South-East 100 All Branches
3 Charing Cross 11 London 100 All Branches
4 Liverpool 12 North-West 100 All Branches
5 Manchester 12 North-West 100 All Branches
and a table of accounts:
SQL> select * from accounts;
ACCOUNT_ID ACCOUNT_DESC TOTAL_ACCOUNTS_ID TOTAL_ACCOUNTS_
---------- --------------- ----------------- ---------------
1 Wages 100 All Accounts
2 Sales 100 All Accounts
and then a table of account balances:
SQL> select * from balances;
ACCOUNT_ID BRANCH_ID BALANCE
---------- ---------- ----------
1 1 50
2 1 80
1 2 40
2 2 40
1 3 100
2 3 125
1 4 80
2 4 60
1 5 90
2 5 95
10 rows selected.
We then create an analytic workspace with account and branch dimensions, and
a balances cube:

Once the analytic workspace has been loaded from our source tables, we can open up the OLAP Worksheet
and take a look at what’s been created.
-> listnames39 DIMENSIONs 52 VARIABLEs -------------------------------- -------------------------------- ACCOUNTS ACCOUNTS_ACCOUNT_H_HIERDEF ACCOUNTS_HIERLIST ACCOUNTS_ACCOUNT_LEVELDEF ACCOUNTS_LEVELLIST ACCOUNTS_COLUMN_COUNT AGGREGATE_DIMENSION_PROP ACCOUNTS_COLUMN_MAP AGGREGATE_GENERIC_PROP ACCOUNTS_CREATEDBY ALLOCATE_DIMENSION_PROP ACCOUNTS_HIER_IS_VALUE ALLOCATE_GENERIC_PROP ACCOUNTS_IS_SESSION ALL_ATTRIBUTES ACCOUNTS_LONG_DESCRIPTION ALL_ATTRTYPES ACCOUNTS_SHORT_DESCRIPTION ALL_CALC_MEMBERS ACCOUNTS_TOTAL_ACCOUNTS_LEVELDEF ALL_CUBES AGGREGATE_DIMENSION_CATALOG ALL_DESCTYPES AGGREGATE_GENERIC_CATALOG ALL_DIMENSIONS ALLOCATE_DIMENSION_CATALOG ALL_HIERARCHIES ALLOCATE_GENERIC_CATALOG ALL_LANGUAGES ALL_DESCRIPTIONS ALL_LEVELS ALL_TOOLS_PROP ALL_MEASUREFOLDERS ATTR_DATA_MAP ALL_MEASURES ATTR_VISIBLE ALL_MODELS AW_NAMES ALL_OBJECTS BALANCES_BALANCE_COUNTVAR ALL_SOLVEDFNS BALANCES_BALANCE_STORED ALL_SOLVEGROUPS BRANCHES_BRANCHES_H_HIERDEF ALL_SOLVES BRANCHES_BRANCH_LEVELDEF BALANCES BRANCHES_COLUMN_COUNT BRANCHES BRANCHES_COLUMN_MAP BRANCHES_HIERLIST BRANCHES_CREATEDBY BRANCHES_LEVELLIST BRANCHES_HIER_IS_VALUE CALC_MEMBER_PROP BRANCHES_IS_SESSION COLUMN_DIM BRANCHES_LONG_DESCRIPTION CUBE_PROP BRANCHES_REGION_LEVELDEF DIM_OBJ_LIST BRANCHES_SHORT_DESCRIPTION FORECAST_PROP BRANCHES_TOTAL_BRANCHES_LEVELDEF GEN_OBJ_ROLES CALC_MEMBER_CATALOG GID_DIMENSION CUBE_CATALOG IS_LOADED_DIMENSION DIMKEY_IS_UNIQUE MAPGROUP_DIM DIM_AW_OBJS MEASURE_PROP DIM_KEY_MAP TIME_GLEVEL_DIMENSION FORECAST_CATALOG TIME_OFFSET_DIMENSION GEN_AW_OBJS MEASURE_CATALOG MEAS_DATA_MAP MEAS_KEY_MAP MEAS_OPERATOR_MAP OBJECT_LOADED OBJ_CREATEDBY OBJ_ORIGINATOR PARENT_KEY_MAP PARENT_LVL_MAP SOLVEDFN_TYPE SOLVE_MEMBER_SELECTION VISIBLE ___XML_USER_AW_VERSION 1 PROGRAM 1 FORMULA -------------------------------- -------------------------------- ONATTACH BALANCES_BALANCE 37 RELATIONs 5 COMPOSITEs -------------------------------- -------------------------------- ACCOUNTS_FAMILYREL ATTR_MAP_COMPOSITE ACCOUNTS_FAMILYRELVAL BALANCES_COMPOSITE ACCOUNTS_GID HIERLVL_MAP_COMPOSITE ACCOUNTS_LEVELREL LVL_MAP_COMPOSITE ACCOUNTS_LOADED MEAS_MAP_COMPOSITE ACCOUNTS_PARENTREL BRANCHES_FAMILYREL BRANCHES_FAMILYRELVAL BRANCHES_GID BRANCHES_LEVELREL BRANCHES_LOADED BRANCHES_PARENTREL CALC_MEMBER_BASE_DIMENSION CUBE_AGGREGATION CUBE_DFLT_PARTITION_HIERARCHY CUBE_DFLT_PARTITION_LEVEL CUBE_MEASURES DEFAULT_HIER DIM_ATTRIBUTES DIM_HIERARCHIES DIM_LEVELS DYNAMIC_MEAS_AGGREGATION FOLDER_PARENTREL HIER_SORT_ATTR MAPGROUP_CUBEREL MAPGROUP_DIMREL MAPGROUP_HIERREL MAPGROUP_LVLREL MEAS_DOMAIN MEAS_PARTITION_HIERARCHY MEAS_PARTITION_LEVEL MODEL_BASE_DIMENSION RELATIONAL_ATTRIBUTE_DATA RELATIONAL_MEASURE_DATA SOLVE_BASE_MEAS SOLVE_SOLVEDFN SOLVE_SOURCE_MEAS 2 MODELs 22 VALUESETs -------------------------------- -------------------------------- BALANCES_ACCOUNTS_AWXMLMODEL ACCOUNTS_AGGRDIM_VSET BALANCES_BRANCHES_AWXMLMODEL ACCOUNTS_AGGRHIER_VSET ACCOUNTS_HIER_LEVELS ACCOUNTS_INHIER ACCOUNTS_LOAD_STATUS_VSET BRANCHES_AGGRDIM_VSET BRANCHES_AGGRHIER_VSET BRANCHES_HIER_LEVELS BRANCHES_INHIER BRANCHES_LOAD_STATUS_VSET CALC_MEMBERS_IN_MODEL CALC_MEMBER_OTHER_DIMENSIONS CUBE_COMPOSITE_BASES CUBE_DIMENSIONS DEPENDENT_MEASURES MEAS_COMPOSITE_BASES MEAS_IN_FOLDER MODEL_OTHER_DIMENSIONS SOLVEDFN_CALCULATION_ORDER SOLVEDFN_SOLVE_ORDER SOLVE_ORDER SOLVE_TARGET_MEAS 4 AGGMAPs 11 SURROGATEs -------------------------------- -------------------------------- OBJ1962518006 ACCOUNTS_ACCOUNT_SURR OBJ1962518006_PRT_PRTAGGMAP ACCOUNTS_HIERLIST_SURR OBJ1962518006_PRT_RUNAGGMAP ACCOUNTS_LEVELLIST_SURR OBJ1962518006_PRT_TOPAGGMAP ACCOUNTS_TOTAL_ACCOUNTS_SURR BRANCHES_BRANCH_SURR BRANCHES_HIERLIST_SURR BRANCHES_LEVELLIST_SURR BRANCHES_REGION_SURR BRANCHES_TOTAL_BRANCHES_SURR __XML_GENERATED_1 __XML_GENERATED_2
What we’ve got here is the two dimensions we’ve created (ACCOUNTS and
BRANCHES), the measure BALANCES_BALANCE_STORED, and a whole load of additional
objects that make up the standard form metadata. The measure, which we called
BALANCES in the AWM Model view, is named within the AW using the format
CUBENAME_MEASURENAME_STORED.
We can then take a look at the ACCOUNTS dimension that has been set up,
listing out the member ID (taken from our ACCOUNT_ID source column) and the long
description (taken from the ACCOUNT_DESC source column).
->rpr down accounts w 30 accounts_short_description
--ACCOUNTS_SHORT_DESCRIPTION--
--------ALL_LANGUAGES---------
ACCOUNTS ENGLISH_UNITED KINGDOM
-------------- ------------------------------
TOTAL_ACCOUNTS All Accounts
_100
ACCOUNT_1 Wages
ACCOUNT_2 Sales
Do the same for the BRANCHES dimension,
->rpr down branches w 30 branches_short_description
--BRANCHES_SHORT_DESCRIPTION--
--------ALL_LANGUAGES---------
BRANCHES ENGLISH_UNITED KINGDOM
-------------- ------------------------------
TOTAL_BRANCHES All Branches
_100
REGION_10 South-East
REGION_11 London
REGION_12 North-West
BRANCH_1 Brighton
BRANCH_2 Worthing
BRANCH_3 Charing Cross
BRANCH_4 Liverpool
BRANCH_5 Manchester
and then list out the contents of the measure.
->rpr balances_balance_stored
-------------------------------------BALANCES_BALANCE_STORED--------------------------------------
---------------------------------------------BRANCHES---------------------------------------------
TOTAL_BRAN
ACCOUNTS CHES_100 REGION_10 REGION_11 REGION_12 BRANCH_1 BRANCH_2 BRANCH_3 BRANCH_4 BRANCH_5
-------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
TOTAL_ACCOUNTS 760.00 210.00 225.00 325.00 130.00 80.00 NA NA NA
_100
ACCOUNT_1 360.00 90.00 100.00 170.00 50.00 40.00 100.00 80.00 90.00
ACCOUNT_2 400.00 120.00 125.00 155.00 80.00 40.00 125.00 60.00 95.00
Now, say that we wanted to create a new measure, that contained the
percentage of sales that wages represented. This measure would have one
dimension, BRANCHES, and would be calculated by taking the BALANCES measure for
each branch and dividing wages by sales then multiplying by 100. We could do
this at the relational end, creating a new table for this measure, calculating
the percentage and then loading it into a RATIOS cube. Old Express hands though
would create a formula instead, dimensioned by BRANCHES, that derived the value
from the BALANCES measure.
->define wages_pct_of_sales formula decimal <BRANCHES> ->eq (BALANCES_BALANCE_STORED(ACCOUNTS 'ACCOUNT_1') / BALANCES_BALANCE_STORED(ACCOUNTS 'ACCOUNT_2'))*100 ->update ->commit
which when queried would give the correct results:
->rpr wages_pct_of_sales
WAGES_PCT_
BRANCHES OF_SALES
-------------- ----------
TOTAL_BRANCHES 90.00
_100
REGION_10 75.00
REGION_11 80.00
REGION_12 109.68
BRANCH_1 62.50
BRANCH_2 100.00
BRANCH_3 80.00
BRANCH_4 133.33
BRANCH_5 94.74
The problem with this approach though, is when you’re working with Oracle
OLAP and analytic workspaces creating the fornula isn’t enough - to display it
as a calculated measure in AWM and Discoverer for OLAP, you’ve got to create all
the associated standard form metadata. Now whilst this is undoubtedly possible,
it’s by no means a simple affair (I’ve yet to get this working) and the metadata
itself changes from release to release. Therefore, what you’ve got to do is take
your formula definition and process it through AWM. Thanks for Anthony Waite and
Bud Endress for explaining how this takes place.
The first step is to create a new cube, in my case called RATIOS, that will
hold my new calculated measure. This cube will have one dimension, BRANCHES, as
ACCOUNTS is being rolled up into the ratio. Note that the cube has no measures.

The next step then is to use a text editor to create an AWM template file. In
my case, the template looked like this:
<Create Id="Action315475">
<ActiveObject>
<DerivedMeasure Name="WAGES_PCT_OF_SALES" LongName="Wages % of Sales"
ShortName="Wages/Sales" PluralName="Wages % of Sales"
Id="RATIOS.WAGES_PCT_OF_SALES.MEASURE" DataType="decimal" isInternal="false"
UseGlobalIndex="false" ForceCalc="false" ForceOrder="false"
SparseType="STANDARD" AutoSolve="DEFAULT"
ExpressionText="(BALANCES_BALANCE_STORED(ACCOUNTS ‘ACCOUNT_1′) / BALANCES_BALANCE_STORED(ACCOUNTS ‘ACCOUNT_2′))*100"/>
</ActiveObject>
</Create>
The bits in bold are the bits that I had to change to suit my formula. Two
points to note on this:
- The ID is made up of CUBE_NAME.FORMULA_NAME.MEASURE, where CUBE_NAME is
the name of the one dimension cube I just set up, FORMULA_NAME is the name
of my formula, and "MEASURE" is just a literal - i.e. just type in
"MEASURE". - The ExpressionText is the text of the formula definition, minus the "eq"
at the start.
Save the template, and then right-click on the calculated measures node in
the RATIOS cube, and create a calculated measure from the template file. Once
the template is loaded, the new calculated measure should be visible within AWM.

Now, if you use the View Data option to look at the calculated measure within
AWM, you should see the values as expected.

Note as well how the aggregation has been carried out for you,
properly aggregating the percentages rather than just SUMming them up. As it’s a
formula, there’s no need to separately process this cube - as long as the base
data has been loaded and aggregated, the formula then picks up the values and
displays them without further processing.
Finally, if I then go into the OLAP Worksheet again, I can view
my formula and then display the values.
->listnames formulas
2 FORMULAs
-------------------------
BALANCES_BALANCE
RATIOS_WAGES_PCT_OF_SALES
->rpr ratios_wages_pct_of_sales
RATIOS_WAG
ES_PCT_OF_
BRANCHES SALES
-------------- ----------
TOTAL_BRANCHES 90.00
_100
REGION_10 75.00
REGION_11 80.00
REGION_12 109.68
BRANCH_1 62.50
BRANCH_2 100.00
BRANCH_3 80.00
BRANCH_4 133.33
BRANCH_5 94.74
The points to take away from this are that firstly, you’re not restricted to
just the calculations that AWM gives you (as long as you know a smattering of
OLAP DML) and that secondly, if you want to manually create these calculations,
don’t try and create them using the OLAP Worksheet and OLAP DML, use AWM
templates instead as the standard form metadata is automatically created for
you, and the definition of the calculation is then preserved in the AW template
and can be saved along with the rest of the AW definition.
Another Batch Of New Oracle Blogs
Source: Mark Rittman's Oracle Weblog [link]
Another new batch of Oracle blogs:
Jeff Hunter (not to be confused with Jeffrey Hunter),
Robert Vollman, Frank van Bortel and
Peter K.
Niall’s site has also
just gone through a bit of a redesign (and spurred on by Tom’s comments it’s
updated a bit more frequently now :-)) and if you’ve not seen them before, make
sure you check out David Aldridge, Howard Rogers,
Pete S and
Mike Ault.
Another Batch Of New Oracle Blogs
Source: Mark Rittman's Oracle Weblog [link]
Another new batch of Oracle blogs:
Jeff Hunter,
Robert Vollman, Frank van Bortel and
Peter K.
Niall’s site has also
just gone through a bit of a redesign (and spurred on by Tom’s comments it’s
updated a bit more frequently now :-)) and if you’ve not seen them before, make
sure you check out David Aldridge, Howard Rogers,
Pete S and
Mike Ault.
IT Manager’s Journal | Is the enterprise software licensing business dying?
Source: OLAP/BI/IM stuff [link]
Another Batch Of New Oracle Blogs
Source: Mark Rittman's Oracle Weblog [link]
Another new batch of Oracle blogs:
Jeff Hunter (make sure you’re not holding a hot cup of coffee when you read
this one),
Robert Vollman, Frank van Bortel and
Peter K.
Niall’s site has also
just gone through a bit of a redesign (and spurred on by Tom’s comments it’s
updated a bit more frequently now :-)) and if you’ve not seen them before, make
sure you check out David Aldridge, Howard Rogers,
Pete S and
Mike Ault.
Another Batch Of New Oracle Blogs
Source: Mark Rittman's Oracle Weblog [link]
Another new batch of Oracle blogs:
Jeff Hunter (make sure you’re not holding a hot cup of coffee when you read
this one),
Robert Vollman and
Peter K.
Niall’s site has also
just gone through a bit of a redesign (and spurred on by Tom’s comments it’s
updated a bit more frequently now :-)) and if you’ve not seen them before, make
sure you check out David Aldridge, Howard Rogers,
Pete S and
Mike Ault.
“Oracle 10g Data Warehousing”
Source: Mark Rittman's Oracle Weblog [link]
I’ve
been meaning to post a plug for this one for a while.
"Oracle 10g Data Warehousing" by Susan Hillson, Shilpa Lawande, Pete Smith
and Lilian Hobbs came out a while ago and is the update to the 9iR2 Data
Warehousing book that I talked about on my Books page. Although the 9iR2 book
was good, I thought it was a bit thin and didn’t really cover much that wasn’t
already in the manuals, however the 10g book is far more substantial and covers
a lot of ground that you don’t normally come across when going through the OWB,
Discoverer and Oracle OLAP documentation. Like the 9iR2 book it covers all of
the products that make up the 10g BI&W product stack, but what’s good about this
version is the degree to which it covers some of the stuff that you tend to
gloss over when working with the tools - stuff like taking backups, using
Enterprise Manager, using the new 10g features such as ASM - and as well has a
big section on using the new analytic and aggregating functions (cube, grouping
sets, rollup) that you get with 9i and 10g.
I’m usually fairly sceptical about printed books as there’s not much you
can’t already get on docs.oracle.com and asktom.oracle.com; however in practice
we’ve found this book invaluable and it’s become standard issue on a number of
big 10g data warehousing projects that we’re working on. The only criticism I
would say is that it only covers the first 10g release of the various tools -
Discoverer, OWB etc - and doesn’t therefore have anything on Discoverer "Drake"
or of course OWB "Paris". Also, like every Oracle BI&W book, the coverage of
Oracle OLAP is fairly sketchy, with nothing on the new 10.1.0.3 or 10.1.0.4
features, but of course that therefore leaves things open for my forthcoming
magnum opus
Funnily enough I was one of the technical reviewers for this book, as I
stepped in at the last minute to tech review a couple of Pete Smith’s chapters.
Pete and I happened to be working together at the same client (for different
companies), with Pete as the DBA (dropping and recreating tables) and me as the
OWB developer (trying to keep OWB going with all my tables appearing and
disappearing). One thing that struck me at the time was the amount of work that
goes into writing a book - a typical blog article is for example an afternoon’s
work, say a day with the research and putting together articles - whilst a book
chapter, at around 20-30 pages, and where you have to get everything just
right, is a lot more effort. The other issue around book writing is the fact
that the technical content has to be completely correct - I’ve experienced this
now with writing for OTN, where articles are technically reviewed by the Oracle
product team - and this review and revision process, whilst ensuring that
content is absolutely spot on, certainly takes up a lot of time - as much time
actually as the time it took to write the article in the first place.
Anyway, I’d thoroughly recommend the book, an excellent grounding in the
technology and processes around building a 10g data warehouse, and an excellent
introduction to 10g new features for those DW developers and DBAs up until now
working with 8i and 9i.
Getting To The Bottom Of NOLOGGING, NOARCHIVELOG and Standby Databases
Source: Mark Rittman's Oracle Weblog [link]
One of the customers I work with is looking to us for advice on how to set
their data warehousing database up. They’re looking to use the Enterprise
Edition of the latest Oracle version (10.1.0.4) and are planning on using Oracle
Warehouse Builder 10g as the ETL tool. Each environment (dev, test, prod) will
have two database instances, one to hold the OWB Design repository (with an 8k
block size) and one to hold the target data warehouse and the OWB Runtime
Repository (with a 16K block size). Setting up the design repository instance
isn’t an issue, it’s just a regular database set up using the default database
template, but the target data warehouse instance is a bit more interesting as
we’re looking to minimize the amount of redo log generated during the warehouse
initial build and then refresh.
Most people are aware that you can specify a NOLOGGING clause when creating
tables, indexes, partitions and tablespaces which under certain circumstances
causes redo generation to be switched off when the object is loaded. You’re also
probably aware that the database itself can be placed in NOARCHIVELOG mode,
which disables the process of redo log archiving which can otherwise slow down a
bulk load into a warehouse. However, what’s often not clear is which of these
options should be used and why you would use one rather than the other, and I
therefore thought it worthwhile going back over the documentation and trying to
set out what the optimal set of choices are. As usual, if you’re reading this
and you spot something that I’ve got wrong, or can add anything that makes the
process clearer, add a comment at the end and I’ll update the posting.
To put this in context, the client runs a number of other Oracle databases,
almost exclusively as the backend for their transactional systems, and the
general rule is that these databases are run in ARCHIVELOG mode, incremental hot
backups are taken during the week using RMAN followed by full hot backups at the
weekend, and the archived redo logs are stored off site and also used to refresh
their DR databases. Given a preference, they would rather run the new data
warehouse database in ARCHIVELOG mode, so that it can fit in with their existing
arrangements, but they are willing to perform the warehouse load and refresh in
NOLOGGING mode as long as we give them some downtime after the load to do a
full, cold backup of the warehouse.
What we would like to do is to somehow disable redo log generation for
initial loads into our dimension tables, and for initial and then subsequent
loads into the fact tables. The rationale for this is that the initial dimension
load, and all loads into the fact table, load a very large amount of data and we
can speed up this process if we don’t have to write redo information into the
redo log files - in other words, we dramatically reduce the amount of disk I/O
we need to perform to load a given object. If our database is running in
ARCHIVELOG mode, we get an additional benefit in that we don’t then need to
archive off these redo log files, again reducing the time to perform the load as
we don’t have to wait around for the ARCn process to archive the filled-up redo
log files. The downside of doing this though is that these data loads are
unrecoverable - that is, if the instance crashes or we have a media (disk)
failure, we wouldn’t be able to recoverer these objects. Whilst that isn’t a bit
deal for the objects we’re working with - we can just reload them from the
source files - it means that we’re going to have problems with any other objects
that are also in that datafile. What this means in practice is that we have to
do a full, cold backup of the datafiles that contain these objects before we can
consider the database "recoverable".
I mentioned earlier on that you can specify a NOLOGGING clause when creating
tables, indexes, partitions and tablespaces. Ignoring tablespaces for the
moment, what this means in practice is that you can create the table (for
example) using the NOLOGGING clause:
CREATE TABLE "PRODUCT"
(
"CLASS_DSC" VARCHAR2(50),
"CLASS_ID" NUMBER,
"FAMILY_DSC" VARCHAR2(50),
"FAMILY_ID" NUMBER,
"ITEM_BUYER" VARCHAR2(50),
"ITEM_DSC" VARCHAR2(50),
"ITEM_ID" NUMBER,
"ITEM_MARKETING_MANAGER" VARCHAR2(50),
"TOTAL_PRODUCT_DSC" VARCHAR2(50),
"TOTAL_PRODUCT_ID" NUMBER)
NOLOGGING
;
Then, you can carry out bulk loads into this table in NOLOGGING mode. For
example:
INSERT
/*+ APPEND */
INTO
"PRODUCT"
("CLASS_DSC",
"CLASS_ID",
"FAMILY_DSC",
"FAMILY_ID",
"ITEM_BUYER",
"ITEM_DSC",
"ITEM_ID",
"ITEM_MARKETING_MANAGER",
"TOTAL_PRODUCT_DSC",
"TOTAL_PRODUCT_ID")
(SELECT
"PRODUCT_DIM"."CLASS_DSC" "CLASS_DSC",
"PRODUCT_DIM"."CLASS_ID" "CLASS_ID",
"PRODUCT_DIM"."FAMILY_DSC" "FAMILY_DSC",
"PRODUCT_DIM"."FAMILY_ID" "FAMILY_ID",
"PRODUCT_DIM"."ITEM_BUYER" "ITEM_BUYER",
"PRODUCT_DIM"."ITEM_DSC" "ITEM_DSC",
"PRODUCT_DIM"."ITEM_ID" "ITEM_ID",
"PRODUCT_DIM"."ITEM_MARKETING_MANAGER" "ITEM_MARKETING_MANAGER",
"PRODUCT_DIM"."TOTAL_PRODUCT_DSC" "TOTAL_PRODUCT_DSC",
"PRODUCT_DIM"."TOTAL_PRODUCT_ID" "TOTAL_PRODUCT_ID"
FROM "PRODUCT_DIM" "PRODUCT_DIM"
);
One point to note here is that the NOLOGGING clause doesn’t mean that
every DML operations on the table works in NOLOGGING mode - it’s only bulk
operations like INSERT /*+ APPEND */ and Direct Loader (SQL*Loader) that take
advantage of it. Regular INSERTs, DELETEs and UPDATEs will generate normal redo,
even when you specify NOLOGGING. This is a popular misconception as logic would
suggest that the NOLOGGING table creation clause would suppress all redo log
generation, not just direct path loads, but all the NOLOGGING clause does is
specify NOLOGGING for the table creation process itself (in case you’re using
CREATE TABLE … AS SELECT) and for any subsequent direct path insertions.
Another misconception is that NOLOGGING suppresses all redo, even when using
direct path loads. What actually happens is that NOLOGGING operations generate
minimal redo, not no redo whatsoever, with this minimal redo generated so that
the data dictionary itself is preserved. It’s only a relatively small amount,
but it’s worth noting that some will be produced, whatever steps you take.
With indexes and partition creation, it’s a similar story - you can specify
NOLOGGING in the object creation clause, and then certain operations (in the
case of indexes, creation and ALTERs (rebuilds)) can take advantage of NOLOGGING.
So what about tablespaces? What do you get when you specify the NOLOGGING
clause when creating a tablespace?
CREATE TABLESPACE dims
NOLOGGING
DATAFILE ‘c:\oradata\dims01.dbf’ SIZE 50M
BLOCKSIZE 8192
EXTENT MANAGEMENT LOCAL UNIFORM SIZE 256K
SEGMENT SPACE MANAGEMENT AUTO
ONLINE;
What this actually does is ensure that the default LOGGING|NOLOGGING
attribute for objects created in this tablespace is set to NOLOGGING. You can
still subsequently create a table as LOGGING in a tablespace you define as
NOLOGGING, all the NOLOGGING attribute does for a tablespace is set the default
for objects later created in the tablespace.
Once you’ve bulk loaded your data, created your table using CREATE TABLE …
AS SELECT or rebuilt your index, the important next step is to perform a
baseline backup of the datafiles that contain your objects; once you’ve done
this, and until such time as you carry out any other unrecoverable DML, you can
then recover your objects like any other. This baseline backup
can be a hot backup (as long as you’re in ARCHIVELOG mode, which in our case
we are) and this backup then ensures that we have a baseline copy of the object
within our backup set if we then need to perform a full recovery.
So where does the ARCHIVELOG setting for databases come into things?
The general advice for setting up warehouse target databases is to put the
database into NOARCHIVELOG mode. NOARCHIVELOG suppresses the archiving of redo
log files and improves the performance of data loads as you don’t have the ARCn
processes running around in the background and holding things up as they copy
the redo log files off into your archive locations. The obvious cost of this is
that you will only have limited options when recovering your instance -
basically if you’ve overwritten your redo log files and not had the contents
archived off, and you have a media failure, you can only restore your instance
back to the last full cold backup you took (as the redo required to roll forward
from this point has probably been overwritten, and you didn’t archive it off
beforehand). However, for data warehouses, this isn’t usually an issue, as you
can just reload from the source files or databases, and the benefits of your
data load being quicker are usually greater than the (slim) chance that you’ll
need to perform a media recovery.
What we have here then are two separate concepts - ARCHIVELOG and
NOARCHIVELOG mode, which determines whether redo logs are archived off when they
fill up, or just overwritten and the redo then lost, and LOGGING | NOLOGGING,
where we either allow all redo to be generated as normal, or in certain
circumstances (the important ones, when we need to load lots of data) suppress
the majority of redo, at the cost of our operation being unrecoverable and
subsequently needing to reload the objects from the source files. These two
concepts interact in two ways:
- If we’re looking to suppress as much redo as possible, chances are we’d
like to suppress archiving as well (and we might as well, as the objects
we’re loading are unrecoverable anyway), and - If you specify NOARCHIVELOG mode for a database, all indexes, tables and
partitions are
automatically set to NOLOGGING, and therefore all INSERT /*+ APPEND */
and SQL*Loader direct load operations will automatically generate minimal
redo.
However, in our instance, the DBAs want to keep the database in ARCHIVELOG
mode, so that they can manage the database in the same way as the OLTP databases
and ship the archivelogs off to their DR site to maintain their standby
databases. As I was reading through the various articles and questions on Asktom,
I noticed
this answer which suggested that there could be some complications when
using NOLOGGING operations when you later on wanted to ship the archivelogs to a
standby database:
"when using a standby database for disaster recovery — you cannot
perform nologging operations without performing additional, manual work (in
fact, in 9i,
we can set a flag that says "ignore the request for nologging — log it
anyway" on the database to avoid this issue alltogether).Please read the standby database documention — it does explain this in some
detail:"
The answer then points through to the
Oracle 8.1.6 documentation for Standby Database which talks about the
problems you’ll get when you try and apply the (minimal) redo logs that were
written for our unrecoverable transactions to your standby database:
"In some SQL statements, the user has the option of specifying
the NOLOGGING clause, which indicates that the database operation is not
logged in the redo log file. Even though the user specifies the NOLOGGING
clause, a redo log record is still written to the redo log. However, when
the redo log file is transferred to the standby site and applied to the
standby database, a portion of the datafile is unusable and marked as being
unrecoverable. When you either activate the standby database, or open the
standby database with the read-only option, and attempt to read the range of
blocks that are marked as "UNRECOVERABLE," you will see error messages
similar to the following:ORA-01578: ORACLE data block corrupted (file # 1, block # 2521) ORA-01110: data file 1: '/vobs/oracle/dbs/stdby/tbs_1.f' ORA-26040: Data block was loaded using the NOLOGGING option"
good, and certainly I can’t imagine our DBA chuckling to
himself when at 2.30 in the morning he’s called to recover our database and that
error message comes up. It’s not as bad as it seems actually, as the article
then goes on to explain how you recover from this situation, but the point that
Tom was trying to make is that NOLOGGING isn’t something you should do and then
tell no-one about - you should understand what it does, talk it through with the
DBA and have a plan in mind for when you do eventually need to perform a
recovery and work around the fact that you may have some objects that are
unrecoverable - if you’ve done your baseline backup after your bulk load, it
shouldn’t affect you, but if your data centre gets hit by a meteorite then it
may well affect the process of starting up your standby database. Given that
Standby Database is now Data Guard with 10g, I checked the 9i Data Guard
documentation (the 10g documentation seems to be offline at the moment) and
found this comment, which suggests that in fact we might want
to actually reconsider our use of NOLOGGING operations:
“For logical standby databases, when SQL apply operations
encounter a redo log record for an operation performed with theNOLOGGINGclause, it skips over the record and continues applying
changes from later records. Later, if an attempt is made to access one of
the records that were updated withNOLOGGINGin effect, the
following error is returned:ORA-01403 no data foundTo recover after the
NOLOGGINGclause is
specified, re-create one or more tables from the primary database, as
described in
Section 9.1.6.Note: In general, use of the
NOLOGGING
clause is not recommended. Optionally, if you know in advance
that operations using theNOLOGGINGclause will be
performed on certain tables in the primary database, you might
want to prevent the application of SQL statements associated
with these tables to the logical standby database by using the
DBMS_LOGSTDBY.SKIP procedure."
So, in summary, in our situation where we want to minimise the amount of redo
generated for our dimension and fact table loads, but we want to work in
ARCHIVELOG mode and ship our logs to a standby database, I’d say the best
approach would be to:
- Keep the database in ARCHIVELOG mode
- Create separate tablespaces, with their own datafiles, for the
dimensions and facts that we are going to bulk load, and create these
tablespaces as NOLOGGING - Create the tables to implement these dimensions and facts as normal
(i.e. without the NOLOGGING clause, as this is already the default for their
tablespaces) - Use INSERT /*+ APPEND */ to
bulk load our data into these tables - Once the loads have completed, do a baseline hot backup of the datafiles
that relate to the tables/tablespaces we’ve just loaded - At this point, our main database is completely recoverable.
- If we need to recover the standby database, be prepared for the fact
that some manual work might be needed to activate the standby database, and
in fact be prepared for the DBA to actually stop us from using NOLOGGING
through the use of the DBMS_LOGSTDBY.SKIP procedure.
Any comments?
Getting To The Bottom Of NOLOGGING, NOARCHIVELOG and Standby Databases
Source: Mark Rittman's Oracle Weblog [link]
One of the customers I work with is looking to us for advice on how to set
their data warehousing database up. They’re looking to use the Enterprise
Edition of the latest Oracle version (10.1.0.4) and are planning on using Oracle
Warehouse Builder 10g as the ETL tool. Each environment (dev, test, prod) will
have two database instances, one to hold the OWB Design repository (with an 8k
block size) and one to hold the target data warehouse and the OWB Runtime
Repository (with a 16K block size). Setting up the design repository instance
isn’t an issue, it’s just a regular database set up using the default database
template, but the target data warehouse instance is a bit more interesting as
we’re looking to minimize the amount of redo log generated during the warehouse
initial build and then refresh.
Most people are aware that you can specify a NOLOGGING clause when creating
tables, indexes, partitions and tablespaces which under certain circumstances
causes redo generation to be switched off when the object is loaded. You’re also
probably aware that the database itself can be placed in NOARCHIVELOG mode,
which disables the process of redo log archiving which can otherwise slow down a
bulk load into a warehouse. However, what’s often not clear is which of these
options should be used and why you would use one rather than the other, and I
therefore thought it worthwhile going back over the documentation and trying to
set out what the optimal set of choices are. As usual, if you’re reading this
and you spot something that I’ve got wrong, or can add anything that makes the
process clearer, add a comment at the end and I’ll update the posting.
To put this in context, the client runs a number of other Oracle databases,
almost exclusively as the backend for their transactional systems, and the
general rule is that these databases are run in ARCHIVELOG mode, incremental hot
backups are taken during the week using RMAN followed by full hot backups at the
weekend, and the archived redo logs are stored off site and also used to refresh
their DR databases. Given a preference, they would rather run the new data
warehouse database in ARCHIVELOG mode, so that it can fit in with their existing
arrangements, but they are willing to perform the warehouse load and refresh in
NOLOGGING mode as long as we give them some downtime after the load to do a
full, cold backup of the warehouse.
What we would like to do is to somehow disable redo log generation for
initial loads into our dimension tables, and for initial and then subsequent
loads into the fact tables. The rationale for this is that the initial dimension
load, and all loads into the fact table, load a very large amount of data and we
can speed up this process if we don’t have to write redo information into the
redo log files - in other words, we dramatically reduce the amount of disk I/O
we need to perform to load a given object. If our database is running in
ARCHIVELOG mode, we get an additional benefit in that we don’t then need to
archive off these redo log files, again reducing the time to perform the load as
we don’t have to wait around for the ARCn process to archive the filled-up redo
log files. The downside of doing this though is that these data loads are
unrecoverable - that is, if the instance crashes or we have a media (disk)
failure, we wouldn’t be able to recoverer these objects. Whilst that isn’t a bit
deal for the objects we’re working with - we can just reload them from the
source files - it means that we’re going to have problems with any other objects
that are also in that datafile. What this means in practice is that we have to
do a full, cold backup of the datafiles that contain these objects before we can
consider the database "recoverable".
I mentioned earlier on that you can specify a NOLOGGING clause when creating
tables, indexes, partitions and tablespaces. Ignoring tablespaces for the
moment, what this means in practice is that you can create the table (for
example) using the NOLOGGING clause:
CREATE TABLE "PRODUCT"
(
"CLASS_DSC" VARCHAR2(50),
"CLASS_ID" NUMBER,
"FAMILY_DSC" VARCHAR2(50),
"FAMILY_ID" NUMBER,
"ITEM_BUYER" VARCHAR2(50),
"ITEM_DSC" VARCHAR2(50),
"ITEM_ID" NUMBER,
"ITEM_MARKETING_MANAGER" VARCHAR2(50),
"TOTAL_PRODUCT_DSC" VARCHAR2(50),
"TOTAL_PRODUCT_ID" NUMBER)
NOLOGGING
;
Then, you can carry out bulk loads into this table in NOLOGGING mode. For
example:
INSERT
/*+ APPEND */
INTO
"PRODUCT"
("CLASS_DSC",
"CLASS_ID",
"FAMILY_DSC",
"FAMILY_ID",
"ITEM_BUYER",
"ITEM_DSC",
"ITEM_ID",
"ITEM_MARKETING_MANAGER",
"TOTAL_PRODUCT_DSC",
"TOTAL_PRODUCT_ID")
(SELECT
"PRODUCT_DIM"."CLASS_DSC" "CLASS_DSC",
"PRODUCT_DIM"."CLASS_ID" "CLASS_ID",
"PRODUCT_DIM"."FAMILY_DSC" "FAMILY_DSC",
"PRODUCT_DIM"."FAMILY_ID" "FAMILY_ID",
"PRODUCT_DIM"."ITEM_BUYER" "ITEM_BUYER",
"PRODUCT_DIM"."ITEM_DSC" "ITEM_DSC",
"PRODUCT_DIM"."ITEM_ID" "ITEM_ID",
"PRODUCT_DIM"."ITEM_MARKETING_MANAGER" "ITEM_MARKETING_MANAGER",
"PRODUCT_DIM"."TOTAL_PRODUCT_DSC" "TOTAL_PRODUCT_DSC",
"PRODUCT_DIM"."TOTAL_PRODUCT_ID" "TOTAL_PRODUCT_ID"
FROM "PRODUCT_DIM" "PRODUCT_DIM"
);
One point to note here is that the NOLOGGING clause doesn’t mean that
every DML operations on the table works in NOLOGGING mode - it’s only bulk
operations like INSERT /*+ APPEND */ and Direct Loader (SQL*Loader) that take
advantage of it. Regular INSERTs, DELETEs and UPDATEs will generate normal redo,
even when you specify NOLOGGING. This is a popular misconception as logic would
suggest that the NOLOGGING table creation clause would suppress all redo log
generation, not just direct path loads, but all the NOLOGGING clause does is
specify NOLOGGING for the table creation process itself (in case you’re using
CREATE TABLE … AS SELECT) and for any subsequent direct path insertions.
Another misconception is that NOLOGGING suppresses all redo, even when using
direct path loads. What actually happens is that NOLOGGING operations generate
minimal redo, not no redo whatsoever, with this minimal redo generated so that
the data dictionary itself is preserved. It’s only a relatively small amount,
but it’s worth noting that some will be produced, whatever steps you take.
With indexes and partition creation, it’s a similar story - you can specify
NOLOGGING in the object creation clause, and then certain operations (in the
case of indexes, creation and ALTERs (rebuilds)) can take advantage of NOLOGGING.
So what about tablespaces? What do you get when you specify the NOLOGGING
clause when creating a tablespace?
CREATE TABLESPACE dims
NOLOGGING
DATAFILE ‘c:\oradata\dims01.dbf’ SIZE 50M
BLOCKSIZE 8192
EXTENT MANAGEMENT LOCAL UNIFORM SIZE 256K
SEGMENT SPACE MANAGEMENT AUTO
ONLINE;
What this actually does is ensure that the default LOGGING|NOLOGGING
attribute for objects created in this tablespace is set to NOLOGGING. You can
still subsequently create a table as LOGGING in a tablespace you define as
NOLOGGING, all the NOLOGGING attribute does for a tablespace is set the default
for objects later created in the tablespace.
Once you’ve bulk loaded your data, created your table using CREATE TABLE …
AS SELECT or rebuilt your index, the important next step is to perform a
baseline backup of the datafiles that contain your objects; once you’ve done
this, and until such time as you carry out any other unrecoverable DML, you can
then recover your objects like any other. This baseline backup
can be a hot backup (as long as you’re in ARCHIVELOG mode, which in our case
we are) and this backup then ensures that we have a baseline copy of the object
within our backup set if we then need to perform a full recovery.
So where does the ARCHIVELOG setting for databases come into things?
The general advice for setting up warehouse target databases is to put the
database into NOARCHIVELOG mode. NOARCHIVELOG suppresses the archiving of redo
log files and improves the performance of data loads as you don’t have the ARCn
processes running around in the background and holding things up as they copy
the redo log files off into your archive locations. The obvious cost of this is
that you will only have limited options when recovering your instance -
basically if you’ve overwritten your redo log files and not had the contents
archived off, and you have a media failure, you can only restore your instance
back to the last full cold backup you took (as the redo required to roll forward
from this point has probably been overwritten, and you didn’t archive it off
beforehand). However, for data warehouses, this isn’t usually an issue, as you
can just reload from the source files or databases, and the benefits of your
data load being quicker are usually greater than the (slim) chance that you’ll
need to perform a media recovery.
What we have here then are two separate concepts - ARCHIVELOG and
NOARCHIVELOG mode, which determines whether redo logs are archived off when they
fill up, or just overwritten and the redo then lost, and LOGGING | NOLOGGING,
where we either allow all redo to be generated as normal, or in certain
circumstances (the important ones, when we need to load lots of data) suppress
the majority of redo, at the cost of our operation being unrecoverable and
subsequently needing to reload the objects from the source files. These two
concepts interact in two ways:
- If we’re looking to suppress as much redo as possible, chances are we’d
like to suppress archiving as well (and we might as well, as the objects
we’re loading are unrecoverable anyway), and - If you specify NOARCHIVELOG mode for a database, all indexes, tables and
partitions are
automatically set to NOLOGGING, and therefore all INSERT /*+ APPEND */
and SQL*Loader direct load operations will automatically generate minimal
redo.
However, in our instance, the DBAs want to keep the database in ARCHIVELOG
mode, so that they can manage the database in the same way as the OLTP databases
and ship the archivelogs off to their DR site to maintain their standby
databases. As I was reading through the various articles and questions on Asktom,
I noticed
this answer which suggested that there could be some complications when
using NOLOGGING operations when you later on wanted to ship the archivelogs to a
standby database:
"when using a standby database for disaster recovery — you cannot
perform nologging operations without performing additional, manual work (in
fact, in 9i,
we can set a flag that says "ignore the request for nologging — log it
anyway" on the database to avoid this issue alltogether).Please read the standby database documention — it does explain this in some
detail:"
The answer then points through to the
Oracle 8.1.6 documentation for Standby Database which talks about the
problems you’ll get when you try and apply the (minimal) redo logs that were
written for our unrecoverable transactions to your standby database:
"In some SQL statements, the user has the option of specifying
the NOLOGGING clause, which indicates that the database operation is not
logged in the redo log file. Even though the user specifies the NOLOGGING
clause, a redo log record is still written to the redo log. However, when
the redo log file is transferred to the standby site and applied to the
standby database, a portion of the datafile is unusable and marked as being
unrecoverable. When you either activate the standby database, or open the
standby database with the read-only option, and attempt to read the range of
blocks that are marked as "UNRECOVERABLE," you will see error messages
similar to the following:ORA-01578: ORACLE data block corrupted (file # 1, block # 2521) ORA-01110: data file 1: '/vobs/oracle/dbs/stdby/tbs_1.f' ORA-26040: Data block was loaded using the NOLOGGING option"
good, and certainly I can’t imagine our DBA chuckling to
himself when at 2.30 in the morning he’s called to recover our database and that
error message comes up. It’s not as bad as it seems actually, as the article
then goes on to explain how you recover from this situation, but the point that
Tom was trying to make is that NOLOGGING isn’t something you should do and then
tell no-one about - you should understand what it does, talk it through with the
DBA and have a plan in mind for when you do eventually need to perform a
recovery and work around the fact that you may have some objects that are
unrecoverable - if you’ve done your baseline backup after your bulk load, it
shouldn’t affect you, but if your data centre gets hit by a meteorite then it
may well affect the process of starting up your standby database. Given that
Standby Database is now Data Guard with 10g, I checked the 9i Data Guard
documentation (the 10g documentation seems to be offline at the moment) and
found this comment, which suggests that in fact we might want
to actually reconsider our use of NOLOGGING operations if
“For logical standby databases, when SQL apply operations
encounter a redo log record for an operation performed with theNOLOGGINGclause, it skips over the record and continues applying
changes from later records. Later, if an attempt is made to access one of
the records that were updated withNOLOGGINGin effect, the
following error is returned:ORA-01403 no data foundTo recover after the
NOLOGGINGclause is
specified, re-create one or more tables from the primary database, as
described in
Section 9.1.6.
Note:
In general, use of the
NOLOGGING
clause is not recommended. Optionally, if you know in advance
that operations using theNOLOGGINGclause will be
performed on certain tables in the primary database, you might
want to prevent the application of SQL statements associated
with these tables to the logical standby database by using the
DBMS_LOGSTDBY.SKIP procedure."e>
, use of the NOLOGGING clause is not recommended. Optionally, if
you know in advance that operations using the NOLOGGING clause will be performed
on certain tables in the primary database, you might want to prevent the
application of SQL statements associated with these tables to the logical
standby database by using the DBMS_LOGSTDBY.SKIP procedure.
So, in summary, in our situation where we want to minimise the amount of redo
generated for our dimension and fact table loads, but we want to work in
ARCHIVELOG mode and ship our logs to a standby database, I’d say the best
approach would be to:
- Keep the database in ARCHIVELOG mode
- Create separate tablespaces, with their own datafiles, for the
dimensions and facts that we are going to bulk load, and create these
tablespaces as NOLOGGING