BI Blogs

Bringing together Business Intelligence voices from across the web

forget-me-not panties : with sensatech technology

Posted on the May 31st, 2005. Read times

Source: OLAP/BI/IM stuff [link]

Nice! This is a joke I nuderstand, but does cause the mind to jog… The Military would probably consider something like this in the future (bringing an Aliens (the movie) type of experience to platoon leaders, where blips just fall of the screen as people die. For business use, this probably isnt a bad idea for workers in hazardous places.

forget-me-not panties : with sensatech technology: “These panties will monitor the location of your daughter, wife or girlfriend 24 hours a day, and can even monitor their heart rate and body temperature.

Based on pioneering research developed by the U.S. military at DARPA (Defense Advanced Research Projects Agency), we have brought this revolutionary technology, previously only available to the military, to you! “

OracleAppsBlog - A day in the life of an Oracle Applications Consultant

Posted on the May 31st, 2005. Read times

Source: OLAP/BI/IM stuff [link]

OracleAppsBlog - A day in the life of an Oracle Applications Consultant: “Jo Davis, an Oracle e-Business Suite Functional Consultant in Sydney, New South Wales, Australia has just started a blog entitled Confessions of an Oracle Applications Consultant. “

Surrogate key generation in SSIS

Posted on the May 30th, 2005. Read times

Source: SQL BI [link]

In the last few days I tested several ways to generate surrogate keys when loading a dimension into a data mart.

The classical approach with SQL Server is to rely on a INT IDENTITY column. It works fine, but you don’t know what surrogate key has been generated for a given application key until you read the row from the table. Not a big issue with DTS, because you can’t do very much (in a way faster than SQL itself) even if you know this number, but with SSIS if you want to push performances to the limit, you need to load a fact table within the dimension table(s) without reading what you just writed into a table.

The first step in this direction is to generate surrogate keys inside a SSIS package. If you do a complete dimension load at each run, you can solve the whole problem using a scripting component inside a data flow task. Read this article from Jamie Thomson to see how you can implement it. But life is never as easy as you could desire, so when you have to incrementally load a dimension table then you need to give to this script the right initial value (the last ID currently used instead of 0).

After several approaches discussed on the newsgroup, I came to the conclusion that the most affordable way to implement it is a not-so-desirable one:

  • define a Execute SQL Task which execute a SELECT COALESCE( MAX( ID_Dimension ), 0 ) AS LastID FROM Dimension and put the single row ResultSet into a variable (let’s say we use User::ID_Dimension)

  • define a Data Flow Task in the same way Jamie did in his article

  • put the User::ID_Dimension variable into the ReadOnly Variables collection in the Script Component

  • define PreExecute method inside the script to initialize the value for counter you will use to enumerate new surrogate keys (see code below)

Public

NT>Class ScriptMain

   

ONT> UserComponent

   

> counter As Int32

   

T> Overrides Sub Input_ProcessInputRow(ByVal Row As InputBuffer)

        counter = counter + 1

        Row.IDDimension = counter

   

FONT> Sub

;   T> Overrides Sub PreExecute()

       

T>.PreExecute()

        counter = Variables.IDDimension

   

> Sub

End

NT>Class

That’s it. You can download the sample package here.

Now, don’t pretend this solution to be faster than a SQL Server INT IDENTITY column… by now, it isn’t, but we can’t judge this with current CTPs. But don’t desperate: it could be better in the future and, more important, if you really need to use the generated keys in other transformations you could already take advantage from this architecture.

Why it’s a not-so-desirable way? Well, I don’t like to rely on a variable defined in the control flow while all the dimension-load logic is inside a data flow task. Imagine how ugly could become a SSIS package if you load 20 or 30 dimensions; of course you can group two or more tasks together, but I consider this only as a work-around. There is another issue if you would like to update your somewhere-in-the-package variable with the last generated surrogate key, because you can’t use the same variable as ReadOnly and ReadWrite (in the Script Transform) and you can’t read a variable passed as ReadWrite outside the PostExecute method (I’m right: you can only read/write it inside PreExecute overridden method, you can read it in the PreExecute methos - as in my sample - only if variable is passed as ReadOnly).

I thought other ways to find a better solution (most of them has been suggested by kindly people on the newsgroup):

  • Script transformation with two inputs: not supported (I would like to pass the real data source as Input1 and the last used ID as Input2)
  • Cross Join between two inputs (defined as the previous idea)
  • Custom data flow task (with one input and a property with table/column to look for the last ID) - it needs to be written

other possible variations on those themes. But I have to say that SSIS could be improved very much to make life easier for the BI developer: a real “definitive” surrogate key generation task could be the right answer to this kind of needs.

¿Está en crisis el software empresarial?

Posted on the May 30th, 2005. Read times

Source: Mr. BI [link]



Esto es lo que dice este artículo en el IT Journal a propósito de la última caída de ventas de Siebel de un 40% en el último trimestre. Comenta que uno de los competidores que está erosionando las ventas de Siebel es salesforce.com, que aunque gartner (lo podéis leer en todoBI) no lo ve apropiado para grandes empresas, es configurable y ya hay compañías especializadas en hacer personalizaciones mas complejas. Pero como en internet los negocios vuelan, también comenta el artículo que a salesforce le quita ventas SugarCRM desarrollado en código abierto, al igual que CentricCRM. Yo modestamente creo que la crisis de Siebel no tiene que ver tanto con salesforce.com y otros sino más bien con que es un software de nicho, que ha tenido un crecimiento explosivo y habiendose llevado muchos grandes proyectos, ahora le cuesta seguir el ritmo porque ya no quedan grandes empresas dispuestas a pagar esos precios por unas funcionalidades menos imprescindibles que por ejemplo los sistemas contables que vende SAP. Pero Siebel seguirá vendiendo a empresas medianas y grandes. La aparición de software barato, o gratis como freeCRM.com que se financia con banners; lo que permite es que pequeñas empresas e incluso particulares, puedan disfrutar de un crm, o de una web corporativa con sus estadísticas, un servicio de correo, una suite de office..(tenéis más en la gacela).. Esto en mi opinión fomentará la aparición de emprendedores, que desde su casa con una banda ancha y un teléfono móvil, ya tendrán todo lo necesario para funcionar como empresa.

Off To Manchester…

Posted on the May 30th, 2005. Read times

Source: Mark Rittman's Oracle Weblog [link]

Well that’s it for the rest of the week for me. I’m off up to Manchester for
Jonathan Lewis’ "Optimising
Oracle, Performance By Design"
seminar for the next three days, and as no
doubt I’ll be back in the hotel each evening trying out some of Jonathan’s ideas
on the laptop I’m definitely leaving the blog alone for the next few days. When
I’m back from the course I’ll go through some of the concepts that Jonathan
raises, and I’m particularly looking forward to an in-depth look at the
cost-based optimiser, latches and locks and parsing and optimiser costs. How sad
am I? Anyway, bye for now and I’ll be back next week.

Is RAC Suitable For Data Warehousing?

Posted on the May 30th, 2005. Read times

Source: Mark Rittman's Oracle Weblog [link]

Another one of our clients is looking to put some hardware in place for a new
data warehousing project. The two hardware options are:

  1. Buy a single, large Unix server (probably a HP box running HP-UX) with 8
    CPUs
  2. Buy two or more commodity "Lintel" boxes, with 2 or 4 processors,
    running Red Hat Advanced Server, and use RAC to create a single database
    instance.

But just how suitable is Real Application Clusters for data warehousing? The reason I started doubting RAC’s suitability was an article I came across entitled
"Oracle Real Application
Clusters and Data Warehouse Applications"
which stated the following:

"Oracle has marketed the Real Application Clusters (RAC) option as
appropriate for all types of applications, but there is one very important
exception, Oracle data warehouses. Oracle RAC is used primarily for
continuous availability of mission-critical systems (i.e. banking
applications) and for transparent scalability of massive online transactions
systems (i.e. Amazon). However, it is not always the best architecture for
data warehouse that require large materialized view rollups because they
often require large, cohesive RAM regions (large data sorts) and
tightly-coupled CPU?s (fast parallel query) … In sum, Oracle Real
Application Clusters is a wonderful tool for mission-critical database that
must have continuous availability (using RAC with Transparent Application
Failover (TAF)) and for scalability of super-large OLTP systems, Oracle RAC
is not the best choice for data warehouse applications that require
high-speed table scan performance."

Now I know Don and Mike have spent quite a bit of time working with RAC, so I
thought it wise to look into this in a bit more detail. What I did in the end
was

post a link to the article
on the

Dizwell Forum
, and ask the forum participants for their experiences with RAC
and data warehousing. I won’t repeat the whole thread (which went on for three
pages, so it’s obviously an area of interest) but if you’re considering RAC for
a data warehousing project,

take a look at the thread
and
Don’s
article
, as they’re both useful bits of background reading. My conclusion? A
large, single server and a collection of clustered servers aren’t
interchangeable concepts; there could be issues with RAC with "edge cases", for
instance (as mentioned in the forum comments) where a parallel query needs to
run over multiple nodes, and the "large cohesive memory area" issue I still
think could be an problem if you need more memory than a single small server can
provide. The lesson I guess though is that there’s no "black and white" rule for
this sort of thing (not that the original article was suggesting this) and it’s
worth getting a bit of peer advice if you’re looking to get an answer to a
question such as this, with presumably big implications if you get it wrong.

From Web Services to SOA and Everything in Between: The Journey Begins Highlights - WebServices.Org

Posted on the May 30th, 2005. Read times

Source: OLAP/BI/IM stuff [link]

A pretty detailed report:

From Web Services to SOA and Everything in Between: The Journey Begins Highlights - WebServices.Org: “The survey found that while there are relatively few advanced Web services sites ? and even fewer bona fide SOA implementations ? managers and executives clearly understand the value proposition this new architecture has to offer. A large number of respondents are currently either deploying simple interfaces, or have new projects under development, the survey finds. Most businesses that do have efforts underway report they are seeing tangible benefits in terms of reuse, developer productivity, and cost savings.

The survey found that 35% of businesses have or plan to have a fully functioning Web services infrastructure in the near future. At this point, about 18% can be considered ?SOA-ready,? meaning they are part of an infrastructure of standardized, loosely coupled services. For most companies, the hard work still lies ahead. About 24% have simple point-to-point implementations, and 29% are still in the development stages. ”

This is a major trend with extreme potential for cost reductions, and its within a stone’s throw of reaching critical mass.

eCuaderno

Posted on the May 29th, 2005. Read times

Source: Mr. BI [link]



No suelo mirar mucho las estadísticas, ya que no espero que un blog sobre un tema tan particular como el BI llegue a tener nunca más de 100 visitantes diarios, pero últimamente estaban un poco alteradas como la primavera, y el motivo es que un blog tan prestigioso como eCuaderno de Jose Luis Orihuela, me ha incluido entre una amplia lista de blogs recomendados en el apartado de innovación. Muchas gracias, estas cosas animan a no dejar de escribir.

“When Smart People Defend Bad Ideas”

Posted on the May 29th, 2005. Read times

Source: Mark Rittman's Oracle Weblog [link]


"Scott Berkun, Smart People, Bad Ideas"
: "Scott Berkun, former Microsoftie,
just got
slashdotted
with an essay about
"Why smart people defend
bad ideas"
. It’s a very interesting essay. I’m not crazy about his examples,
but his portrayal of the bad-idea-defender is dead on: "Until they come face
to face with someone who is tenacious enough to dissect their logic, and
resilient enough to endure the thinly veiled intellectual abuse they dish out
during debate (e.g. ‘You don?t really think that do you?’ or ‘Well if you knew
the rule/law/corollary you wouldn?t say such
things’), they?re never forced to question their ability to defend bad ideas."

And then again: eople, or at least those whose brains have good first
gears, use their speed in thought to overpower others. They?ll jump between
assumptions quickly, throwing out jargon, bits of logic, or rules of thumb at a
rate of fire fast enough to cause most people to become rattled, and give in.
When that doesn?t work, the arrogant or the pompous will throw in some
belittlement and use whatever snide or manipulative tactics they have at their
disposal to further discourage you from dissecting their ideas."

Another good blog posting from

Adam Barr
(author of

"Proudly Serving My Corporate Masters"
, my favourite "working at Microsoft"
book). I thought Adam’s Post, and Scott Berkun’s essay, were particularly
relevant considering the debates that have gone on regarding Oracle performance
tuning. Probably a good essay to read in conjunction with

"Crimes Against Logic"
.

Creating OLAP DML Formulas Using AWM Templates

Posted on the May 29th, 2005. Read times

Source: Mark Rittman's Oracle Weblog [link]

One of the most powerful features of the multidimensional engine behind
analytic workspaces is the ability to create formulas. Formulas, or "calculated
measures" as they’re referred to in AWM10g, are measures that are derived from
other measures. Using AWM, you can create simple formulas that reference other
measures in a cube, allowing you for example to create a "margin" measure
derived from sales and costs measures. If you’re an old Express hand though,
you’ll know that this simple type of formulas is just the tip of the iceberg,
and what you often used to end up doing was creating for example a three
dimensional formula based on measures from four and five dimensional variables,
rolling up unneeded dimensions and pulling in variables held in what would now
be referred to as "cubes".

To take an example, say that you had a table of branches:

SQL> select * from branches;

 BRANCH_ID BRANCH_DESC      REGION_ID REGION_DESC     TOTAL_BRANCHES_ID TOTAL_BRANCHES_
---------- --------------- ---------- --------------- ----------------- ---------------
         1 Brighton                10 South-East                    100 All Branches
         2 Worthing                10 South-East                    100 All Branches
         3 Charing Cross           11 London                        100 All Branches
         4 Liverpool               12 North-West                    100 All Branches
         5 Manchester              12 North-West                    100 All Branches

and a table of accounts:

SQL> select * from accounts;

ACCOUNT_ID ACCOUNT_DESC    TOTAL_ACCOUNTS_ID TOTAL_ACCOUNTS_
---------- --------------- ----------------- ---------------
         1 Wages                         100 All Accounts
         2 Sales                         100 All Accounts

and then a table of account balances:

SQL> select * from balances;

ACCOUNT_ID  BRANCH_ID    BALANCE
---------- ---------- ----------
         1          1         50
         2          1         80
         1          2         40
         2          2         40
         1          3        100
         2          3        125
         1          4         80
         2          4         60
         1          5         90
         2          5         95

10 rows selected.

We then create an analytic workspace with account and branch dimensions, and
a balances cube:

Once the analytic workspace has been loaded from our source tables, we can open up the OLAP Worksheet
and take a look at what’s been created.

-> listnames
   39 DIMENSIONs                      52 VARIABLEs
   --------------------------------   --------------------------------
   ACCOUNTS                           ACCOUNTS_ACCOUNT_H_HIERDEF
   ACCOUNTS_HIERLIST                  ACCOUNTS_ACCOUNT_LEVELDEF
   ACCOUNTS_LEVELLIST                 ACCOUNTS_COLUMN_COUNT
   AGGREGATE_DIMENSION_PROP           ACCOUNTS_COLUMN_MAP
   AGGREGATE_GENERIC_PROP             ACCOUNTS_CREATEDBY
   ALLOCATE_DIMENSION_PROP            ACCOUNTS_HIER_IS_VALUE
   ALLOCATE_GENERIC_PROP              ACCOUNTS_IS_SESSION
   ALL_ATTRIBUTES                     ACCOUNTS_LONG_DESCRIPTION
   ALL_ATTRTYPES                      ACCOUNTS_SHORT_DESCRIPTION
   ALL_CALC_MEMBERS                   ACCOUNTS_TOTAL_ACCOUNTS_LEVELDEF
   ALL_CUBES                          AGGREGATE_DIMENSION_CATALOG
   ALL_DESCTYPES                      AGGREGATE_GENERIC_CATALOG
   ALL_DIMENSIONS                     ALLOCATE_DIMENSION_CATALOG
   ALL_HIERARCHIES                    ALLOCATE_GENERIC_CATALOG
   ALL_LANGUAGES                      ALL_DESCRIPTIONS
   ALL_LEVELS                         ALL_TOOLS_PROP
   ALL_MEASUREFOLDERS                 ATTR_DATA_MAP
   ALL_MEASURES                       ATTR_VISIBLE
   ALL_MODELS                         AW_NAMES
   ALL_OBJECTS                        BALANCES_BALANCE_COUNTVAR
   ALL_SOLVEDFNS                      BALANCES_BALANCE_STORED
   ALL_SOLVEGROUPS                    BRANCHES_BRANCHES_H_HIERDEF
   ALL_SOLVES                         BRANCHES_BRANCH_LEVELDEF
   BALANCES                           BRANCHES_COLUMN_COUNT
   BRANCHES                           BRANCHES_COLUMN_MAP
   BRANCHES_HIERLIST                  BRANCHES_CREATEDBY
   BRANCHES_LEVELLIST                 BRANCHES_HIER_IS_VALUE
   CALC_MEMBER_PROP                   BRANCHES_IS_SESSION
   COLUMN_DIM                         BRANCHES_LONG_DESCRIPTION
   CUBE_PROP                          BRANCHES_REGION_LEVELDEF
   DIM_OBJ_LIST                       BRANCHES_SHORT_DESCRIPTION
   FORECAST_PROP                      BRANCHES_TOTAL_BRANCHES_LEVELDEF
   GEN_OBJ_ROLES                      CALC_MEMBER_CATALOG
   GID_DIMENSION                      CUBE_CATALOG
   IS_LOADED_DIMENSION                DIMKEY_IS_UNIQUE
   MAPGROUP_DIM                       DIM_AW_OBJS
   MEASURE_PROP                       DIM_KEY_MAP
   TIME_GLEVEL_DIMENSION              FORECAST_CATALOG
   TIME_OFFSET_DIMENSION              GEN_AW_OBJS
                                      MEASURE_CATALOG
                                      MEAS_DATA_MAP
                                      MEAS_KEY_MAP
                                      MEAS_OPERATOR_MAP
                                      OBJECT_LOADED
                                      OBJ_CREATEDBY
                                      OBJ_ORIGINATOR
                                      PARENT_KEY_MAP
                                      PARENT_LVL_MAP
                                      SOLVEDFN_TYPE
                                      SOLVE_MEMBER_SELECTION
                                      VISIBLE
                                      ___XML_USER_AW_VERSION

   1 PROGRAM                          1 FORMULA
   --------------------------------   --------------------------------
   ONATTACH                           BALANCES_BALANCE

   37 RELATIONs                       5 COMPOSITEs
   --------------------------------   --------------------------------
   ACCOUNTS_FAMILYREL                 ATTR_MAP_COMPOSITE
   ACCOUNTS_FAMILYRELVAL              BALANCES_COMPOSITE
   ACCOUNTS_GID                       HIERLVL_MAP_COMPOSITE
   ACCOUNTS_LEVELREL                  LVL_MAP_COMPOSITE
   ACCOUNTS_LOADED                    MEAS_MAP_COMPOSITE
   ACCOUNTS_PARENTREL
   BRANCHES_FAMILYREL
   BRANCHES_FAMILYRELVAL
   BRANCHES_GID
   BRANCHES_LEVELREL
   BRANCHES_LOADED
   BRANCHES_PARENTREL
   CALC_MEMBER_BASE_DIMENSION
   CUBE_AGGREGATION
   CUBE_DFLT_PARTITION_HIERARCHY
   CUBE_DFLT_PARTITION_LEVEL
   CUBE_MEASURES
   DEFAULT_HIER
   DIM_ATTRIBUTES
   DIM_HIERARCHIES
   DIM_LEVELS
   DYNAMIC_MEAS_AGGREGATION
   FOLDER_PARENTREL
   HIER_SORT_ATTR
   MAPGROUP_CUBEREL
   MAPGROUP_DIMREL
   MAPGROUP_HIERREL
   MAPGROUP_LVLREL
   MEAS_DOMAIN
   MEAS_PARTITION_HIERARCHY
   MEAS_PARTITION_LEVEL
   MODEL_BASE_DIMENSION
   RELATIONAL_ATTRIBUTE_DATA
   RELATIONAL_MEASURE_DATA
   SOLVE_BASE_MEAS
   SOLVE_SOLVEDFN
   SOLVE_SOURCE_MEAS

   2 MODELs                           22 VALUESETs
   --------------------------------   --------------------------------
   BALANCES_ACCOUNTS_AWXMLMODEL       ACCOUNTS_AGGRDIM_VSET
   BALANCES_BRANCHES_AWXMLMODEL       ACCOUNTS_AGGRHIER_VSET
                                      ACCOUNTS_HIER_LEVELS
                                      ACCOUNTS_INHIER
                                      ACCOUNTS_LOAD_STATUS_VSET
                                      BRANCHES_AGGRDIM_VSET
                                      BRANCHES_AGGRHIER_VSET
                                      BRANCHES_HIER_LEVELS
                                      BRANCHES_INHIER
                                      BRANCHES_LOAD_STATUS_VSET
                                      CALC_MEMBERS_IN_MODEL
                                      CALC_MEMBER_OTHER_DIMENSIONS
                                      CUBE_COMPOSITE_BASES
                                      CUBE_DIMENSIONS
                                      DEPENDENT_MEASURES
                                      MEAS_COMPOSITE_BASES
                                      MEAS_IN_FOLDER
                                      MODEL_OTHER_DIMENSIONS
                                      SOLVEDFN_CALCULATION_ORDER
                                      SOLVEDFN_SOLVE_ORDER
                                      SOLVE_ORDER
                                      SOLVE_TARGET_MEAS

   4 AGGMAPs                          11 SURROGATEs
   --------------------------------   --------------------------------
   OBJ1962518006                      ACCOUNTS_ACCOUNT_SURR
   OBJ1962518006_PRT_PRTAGGMAP        ACCOUNTS_HIERLIST_SURR
   OBJ1962518006_PRT_RUNAGGMAP        ACCOUNTS_LEVELLIST_SURR
   OBJ1962518006_PRT_TOPAGGMAP        ACCOUNTS_TOTAL_ACCOUNTS_SURR
                                      BRANCHES_BRANCH_SURR
                                      BRANCHES_HIERLIST_SURR
                                      BRANCHES_LEVELLIST_SURR
                                      BRANCHES_REGION_SURR
                                      BRANCHES_TOTAL_BRANCHES_SURR
                                      __XML_GENERATED_1
                                      __XML_GENERATED_2

What we’ve got here is the two dimensions we’ve created (ACCOUNTS and
BRANCHES), the measure BALANCES_BALANCE_STORED, and a whole load of additional
objects that make up the standard form metadata. The measure, which we called
BALANCES in the AWM Model view, is named within the AW using the format
CUBENAME
_MEASURENAME_STORED.

We can then take a look at the ACCOUNTS dimension that has been set up,
listing out the member ID (taken from our ACCOUNT_ID source column) and the long
description (taken from the ACCOUNT_DESC source column).

->rpr down accounts w 30 accounts_short_description

               --ACCOUNTS_SHORT_DESCRIPTION--
               --------ALL_LANGUAGES---------
ACCOUNTS           ENGLISH_UNITED KINGDOM
-------------- ------------------------------
TOTAL_ACCOUNTS All Accounts
_100
ACCOUNT_1      Wages
ACCOUNT_2      Sales

Do the same for the BRANCHES dimension,

->rpr down branches w 30 branches_short_description

               --BRANCHES_SHORT_DESCRIPTION--
               --------ALL_LANGUAGES---------
BRANCHES           ENGLISH_UNITED KINGDOM
-------------- ------------------------------
TOTAL_BRANCHES All Branches
_100
REGION_10      South-East
REGION_11      London
REGION_12      North-West
BRANCH_1       Brighton
BRANCH_2       Worthing
BRANCH_3       Charing Cross
BRANCH_4       Liverpool
BRANCH_5       Manchester

and then list out the contents of the measure.

->rpr balances_balance_stored

               -------------------------------------BALANCES_BALANCE_STORED--------------------------------------
               ---------------------------------------------BRANCHES---------------------------------------------
               TOTAL_BRAN
ACCOUNTS        CHES_100  REGION_10  REGION_11  REGION_12   BRANCH_1   BRANCH_2   BRANCH_3   BRANCH_4   BRANCH_5
-------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
TOTAL_ACCOUNTS     760.00     210.00     225.00     325.00     130.00      80.00         NA         NA         NA
_100
ACCOUNT_1          360.00      90.00     100.00     170.00      50.00      40.00     100.00      80.00      90.00
ACCOUNT_2          400.00     120.00     125.00     155.00      80.00      40.00     125.00      60.00      95.00

Now, say that we wanted to create a new measure, that contained the
percentage of sales that wages represented. This measure would have one
dimension, BRANCHES, and would be calculated by taking the BALANCES measure for
each branch and dividing wages by sales then multiplying by 100. We could do
this at the relational end, creating a new table for this measure, calculating
the percentage and then loading it into a RATIOS cube. Old Express hands though
would create a formula instead, dimensioned by BRANCHES, that derived the value
from the BALANCES measure.

->define wages_pct_of_sales formula decimal <BRANCHES>

->eq (BALANCES_BALANCE_STORED(ACCOUNTS 'ACCOUNT_1') / BALANCES_BALANCE_STORED(ACCOUNTS 'ACCOUNT_2'))*100

->update

->commit

which when queried would give the correct results:

->rpr wages_pct_of_sales

               WAGES_PCT_
BRANCHES        OF_SALES
-------------- ----------
TOTAL_BRANCHES      90.00
_100
REGION_10           75.00
REGION_11           80.00
REGION_12          109.68
BRANCH_1            62.50
BRANCH_2           100.00
BRANCH_3            80.00
BRANCH_4           133.33
BRANCH_5            94.74

The problem with this approach though, is when you’re working with Oracle
OLAP and analytic workspaces creating the fornula isn’t enough - to display it
as a calculated measure in AWM and Discoverer for OLAP, you’ve got to create all
the associated standard form metadata. Now whilst this is undoubtedly possible,
it’s by no means a simple affair (I’ve yet to get this working) and the metadata
itself changes from release to release. Therefore, what you’ve got to do is take
your formula definition and process it through AWM. Thanks for Anthony Waite and
Bud Endress for explaining how this takes place.

The first step is to create a new cube, in my case called RATIOS, that will
hold my new calculated measure. This cube will have one dimension, BRANCHES, as
ACCOUNTS is being rolled up into the ratio. Note that the cube has no measures.

The next step then is to use a text editor to create an AWM template file. In
my case, the template looked like this:

<Create Id="Action315475">
    <ActiveObject>
          <DerivedMeasure  Name="WAGES_PCT_OF_SALES" LongName="Wages % of Sales"
		   ShortName="Wages/Sales" PluralName="Wages % of Sales"
		   Id="RATIOS.WAGES_PCT_OF_SALES.MEASURE" DataType="decimal" isInternal="false"
		   UseGlobalIndex="false" ForceCalc="false" ForceOrder="false"
		   SparseType="STANDARD" AutoSolve="DEFAULT"
		   ExpressionText="(BALANCES_BALANCE_STORED(ACCOUNTS ‘ACCOUNT_1′) / BALANCES_BALANCE_STORED(ACCOUNTS ‘ACCOUNT_2′))*100"/>
    </ActiveObject>
</Create>

The bits in bold are the bits that I had to change to suit my formula. Two
points to note on this:

  1. The ID is made up of CUBE_NAME.FORMULA_NAME.MEASURE, where CUBE_NAME is
    the name of the one dimension cube I just set up, FORMULA_NAME is the name
    of my formula, and "MEASURE" is just a literal - i.e. just type in
    "MEASURE".
  2. The ExpressionText is the text of the formula definition, minus the "eq"
    at the start.

Save the template, and then right-click on the calculated measures node in
the RATIOS cube, and create a calculated measure from the template file. Once
the template is loaded, the new calculated measure should be visible within AWM.

Now, if you use the View Data option to look at the calculated measure within
AWM, you should see the values as expected.

Note as well how the aggregation has been carried out for you,
properly aggregating the percentages rather than just SUMming them up. As it’s a
formula, there’s no need to separately process this cube - as long as the base
data has been loaded and aggregated, the formula then picks up the values and
displays them without further processing.

Finally, if I then go into the OLAP Worksheet again, I can view
my formula and then display the values.

->listnames formulas
   2 FORMULAs
   -------------------------
   BALANCES_BALANCE
   RATIOS_WAGES_PCT_OF_SALES

->rpr ratios_wages_pct_of_sales

               RATIOS_WAG
               ES_PCT_OF_
BRANCHES         SALES
-------------- ----------
TOTAL_BRANCHES      90.00
_100
REGION_10           75.00
REGION_11           80.00
REGION_12          109.68
BRANCH_1            62.50
BRANCH_2           100.00
BRANCH_3            80.00
BRANCH_4           133.33
BRANCH_5            94.74

The points to take away from this are that firstly, you’re not restricted to
just the calculations that AWM gives you (as long as you know a smattering of
OLAP DML) and that secondly, if you want to manually create these calculations,
don’t try and create them using the OLAP Worksheet and OLAP DML, use AWM
templates instead as the standard form metadata is automatically created for
you, and the definition of the calculation is then preserved in the AW template
and can be saved along with the rest of the AW definition.

Another Batch Of New Oracle Blogs

Posted on the May 29th, 2005. Read times

Source: Mark Rittman's Oracle Weblog [link]

Another new batch of Oracle blogs:
Jeff Hunter
(not to be confused with Jeffrey Hunter),
Robert Vollman, Frank van Bortel and
Peter K.
Niall’s site has also
just gone through a bit of a redesign (and spurred on by Tom’s comments it’s
updated a bit more frequently now :-)) and if you’ve not seen them before, make
sure you check out David Aldridge, Howard Rogers,
Pete S and
Mike Ault.

Another Batch Of New Oracle Blogs

Posted on the May 29th, 2005. Read times

Source: Mark Rittman's Oracle Weblog [link]

Another new batch of Oracle blogs:
Jeff Hunter
,
Robert Vollman, Frank van Bortel and
Peter K.
Niall’s site has also
just gone through a bit of a redesign (and spurred on by Tom’s comments it’s
updated a bit more frequently now :-)) and if you’ve not seen them before, make
sure you check out David Aldridge, Howard Rogers,
Pete S and
Mike Ault.

IT Manager’s Journal | Is the enterprise software licensing business dying?

Posted on the May 29th, 2005. Read times

Source: OLAP/BI/IM stuff [link]

IT Manager’s Journal | Is the enterprise software licensing business dying?: “Siebel Systems Inc. shows the trend starkly. Siebel is a poster child for the bad old days; software that is very complicated and expensive to acquire and implement. Buyers have had enough. License revenue declined 40 percent in the most recent quarter compared to a year ago.”

Another Batch Of New Oracle Blogs

Posted on the May 28th, 2005. Read times

Source: Mark Rittman's Oracle Weblog [link]

Another new batch of Oracle blogs:
Jeff Hunter
(make sure you’re not holding a hot cup of coffee when you read
this one),
Robert Vollman, Frank van Bortel and
Peter K.
Niall’s site has also
just gone through a bit of a redesign (and spurred on by Tom’s comments it’s
updated a bit more frequently now :-)) and if you’ve not seen them before, make
sure you check out David Aldridge, Howard Rogers,
Pete S and
Mike Ault.

Another Batch Of New Oracle Blogs

Posted on the May 28th, 2005. Read times

Source: Mark Rittman's Oracle Weblog [link]

Another new batch of Oracle blogs:
Jeff Hunter
(make sure you’re not holding a hot cup of coffee when you read
this one),
Robert Vollman and
Peter K.
Niall’s site has also
just gone through a bit of a redesign (and spurred on by Tom’s comments it’s
updated a bit more frequently now :-)) and if you’ve not seen them before, make
sure you check out David Aldridge, Howard Rogers,
Pete S and
Mike Ault.

“Oracle 10g Data Warehousing”

Posted on the May 28th, 2005. Read times

Source: Mark Rittman's Oracle Weblog [link]


I’ve
been meaning to post a plug for this one for a while.

"Oracle 10g Data Warehousing"
by Susan Hillson, Shilpa Lawande, Pete Smith
and Lilian Hobbs came out a while ago and is the update to the 9iR2 Data
Warehousing book that I talked about on my Books page. Although the 9iR2 book
was good, I thought it was a bit thin and didn’t really cover much that wasn’t
already in the manuals, however the 10g book is far more substantial and covers
a lot of ground that you don’t normally come across when going through the OWB,
Discoverer and Oracle OLAP documentation. Like the 9iR2 book it covers all of
the products that make up the 10g BI&W product stack, but what’s good about this
version is the degree to which it covers some of the stuff that you tend to
gloss over when working with the tools - stuff like taking backups, using
Enterprise Manager, using the new 10g features such as ASM - and as well has a
big section on using the new analytic and aggregating functions (cube, grouping
sets, rollup) that you get with 9i and 10g.

I’m usually fairly sceptical about printed books as there’s not much you
can’t already get on docs.oracle.com and asktom.oracle.com; however in practice
we’ve found this book invaluable and it’s become standard issue on a number of
big 10g data warehousing projects that we’re working on. The only criticism I
would say is that it only covers the first 10g release of the various tools -
Discoverer, OWB etc - and doesn’t therefore have anything on Discoverer "Drake"
or of course OWB "Paris". Also, like every Oracle BI&W book, the coverage of
Oracle OLAP is fairly sketchy, with nothing on the new 10.1.0.3 or 10.1.0.4
features, but of course that therefore leaves things open for my forthcoming
magnum opus :-)

Funnily enough I was one of the technical reviewers for this book, as I
stepped in at the last minute to tech review a couple of Pete Smith’s chapters.
Pete and I happened to be working together at the same client (for different
companies), with Pete as the DBA (dropping and recreating tables) and me as the
OWB developer (trying to keep OWB going with all my tables appearing and
disappearing). One thing that struck me at the time was the amount of work that
goes into writing a book - a typical blog article is for example an afternoon’s
work, say a day with the research and putting together articles - whilst a book
chapter, at around 20-30 pages, and where you have to get everything just
right
, is a lot more effort. The other issue around book writing is the fact
that the technical content has to be completely correct - I’ve experienced this
now with writing for OTN, where articles are technically reviewed by the Oracle
product team - and this review and revision process, whilst ensuring that
content is absolutely spot on, certainly takes up a lot of time - as much time
actually as the time it took to write the article in the first place.

Anyway, I’d thoroughly recommend the book, an excellent grounding in the
technology and processes around building a 10g data warehouse, and an excellent
introduction to 10g new features for those DW developers and DBAs up until now
working with 8i and 9i.

Getting To The Bottom Of NOLOGGING, NOARCHIVELOG and Standby Databases

Posted on the May 28th, 2005. Read times

Source: Mark Rittman's Oracle Weblog [link]

One of the customers I work with is looking to us for advice on how to set
their data warehousing database up. They’re looking to use the Enterprise
Edition of the latest Oracle version (10.1.0.4) and are planning on using Oracle
Warehouse Builder 10g as the ETL tool. Each environment (dev, test, prod) will
have two database instances, one to hold the OWB Design repository (with an 8k
block size) and one to hold the target data warehouse and the OWB Runtime
Repository (with a 16K block size). Setting up the design repository instance
isn’t an issue, it’s just a regular database set up using the default database
template, but the target data warehouse instance is a bit more interesting as
we’re looking to minimize the amount of redo log generated during the warehouse
initial build and then refresh.

Most people are aware that you can specify a NOLOGGING clause when creating
tables, indexes, partitions and tablespaces which under certain circumstances
causes redo generation to be switched off when the object is loaded. You’re also
probably aware that the database itself can be placed in NOARCHIVELOG mode,
which disables the process of redo log archiving which can otherwise slow down a
bulk load into a warehouse. However, what’s often not clear is which of these
options should be used and why you would use one rather than the other, and I
therefore thought it worthwhile going back over the documentation and trying to
set out what the optimal set of choices are. As usual, if you’re reading this
and you spot something that I’ve got wrong, or can add anything that makes the
process clearer, add a comment at the end and I’ll update the posting.

To put this in context, the client runs a number of other Oracle databases,
almost exclusively as the backend for their transactional systems, and the
general rule is that these databases are run in ARCHIVELOG mode, incremental hot
backups are taken during the week using RMAN followed by full hot backups at the
weekend, and the archived redo logs are stored off site and also used to refresh
their DR databases. Given a preference, they would rather run the new data
warehouse database in ARCHIVELOG mode, so that it can fit in with their existing
arrangements, but they are willing to perform the warehouse load and refresh in
NOLOGGING mode as long as we give them some downtime after the load to do a
full, cold backup of the warehouse.

What we would like to do is to somehow disable redo log generation for
initial loads into our dimension tables, and for initial and then subsequent
loads into the fact tables. The rationale for this is that the initial dimension
load, and all loads into the fact table, load a very large amount of data and we
can speed up this process if we don’t have to write redo information into the
redo log files - in other words, we dramatically reduce the amount of disk I/O
we need to perform to load a given object. If our database is running in
ARCHIVELOG mode, we get an additional benefit in that we don’t then need to
archive off these redo log files, again reducing the time to perform the load as
we don’t have to wait around for the ARCn process to archive the filled-up redo
log files. The downside of doing this though is that these data loads are
unrecoverable - that is, if the instance crashes or we have a media (disk)
failure, we wouldn’t be able to recoverer these objects. Whilst that isn’t a bit
deal for the objects we’re working with - we can just reload them from the
source files - it means that we’re going to have problems with any other objects
that are also in that datafile. What this means in practice is that we have to
do a full, cold backup of the datafiles that contain these objects before we can
consider the database "recoverable".

I mentioned earlier on that you can specify a NOLOGGING clause when creating
tables, indexes, partitions and tablespaces. Ignoring tablespaces for the
moment, what this means in practice is that you can create the table (for
example) using the NOLOGGING clause:

CREATE TABLE "PRODUCT"
(
"CLASS_DSC" VARCHAR2(50),
"CLASS_ID" NUMBER,
"FAMILY_DSC" VARCHAR2(50),
"FAMILY_ID" NUMBER,
"ITEM_BUYER" VARCHAR2(50),
"ITEM_DSC" VARCHAR2(50),
"ITEM_ID" NUMBER,
"ITEM_MARKETING_MANAGER" VARCHAR2(50),
"TOTAL_PRODUCT_DSC" VARCHAR2(50),
"TOTAL_PRODUCT_ID" NUMBER)
NOLOGGING
;

Then, you can carry out bulk loads into this table in NOLOGGING mode. For
example:

INSERT
/*+ APPEND */
INTO
"PRODUCT"
("CLASS_DSC",
"CLASS_ID",
"FAMILY_DSC",
"FAMILY_ID",
"ITEM_BUYER",
"ITEM_DSC",
"ITEM_ID",
"ITEM_MARKETING_MANAGER",
"TOTAL_PRODUCT_DSC",
"TOTAL_PRODUCT_ID")
(SELECT
"PRODUCT_DIM"."CLASS_DSC" "CLASS_DSC",
"PRODUCT_DIM"."CLASS_ID" "CLASS_ID",
"PRODUCT_DIM"."FAMILY_DSC" "FAMILY_DSC",
"PRODUCT_DIM"."FAMILY_ID" "FAMILY_ID",
"PRODUCT_DIM"."ITEM_BUYER" "ITEM_BUYER",
"PRODUCT_DIM"."ITEM_DSC" "ITEM_DSC",
"PRODUCT_DIM"."ITEM_ID" "ITEM_ID",
"PRODUCT_DIM"."ITEM_MARKETING_MANAGER" "ITEM_MARKETING_MANAGER",
"PRODUCT_DIM"."TOTAL_PRODUCT_DSC" "TOTAL_PRODUCT_DSC",
"PRODUCT_DIM"."TOTAL_PRODUCT_ID" "TOTAL_PRODUCT_ID"
FROM "PRODUCT_DIM" "PRODUCT_DIM"
);

One point to note here is that the NOLOGGING clause doesn’t mean that
every
DML operations on the table works in NOLOGGING mode - it’s only bulk
operations like INSERT /*+ APPEND */ and Direct Loader (SQL*Loader) that take
advantage of it. Regular INSERTs, DELETEs and UPDATEs will generate normal redo,
even when you specify NOLOGGING. This is a popular misconception as logic would
suggest that the NOLOGGING table creation clause would suppress all redo log
generation, not just direct path loads, but all the NOLOGGING clause does is
specify NOLOGGING for the table creation process itself (in case you’re using
CREATE TABLE … AS SELECT) and for any subsequent direct path insertions.

Another misconception is that NOLOGGING suppresses all redo, even when using
direct path loads. What actually happens is that NOLOGGING operations generate
minimal redo, not no redo whatsoever, with this minimal redo generated so that
the data dictionary itself is preserved. It’s only a relatively small amount,
but it’s worth noting that some will be produced, whatever steps you take.

With indexes and partition creation, it’s a similar story - you can specify
NOLOGGING in the object creation clause, and then certain operations (in the
case of indexes, creation and ALTERs (rebuilds)) can take advantage of NOLOGGING.

So what about tablespaces? What do you get when you specify the NOLOGGING
clause when creating a tablespace?

CREATE TABLESPACE dims
NOLOGGING
DATAFILE ‘c:\oradata\dims01.dbf’ SIZE 50M
BLOCKSIZE 8192
EXTENT MANAGEMENT LOCAL UNIFORM SIZE 256K
SEGMENT SPACE MANAGEMENT AUTO
ONLINE;

What this actually does is ensure that the default LOGGING|NOLOGGING
attribute for objects created in this tablespace is set to NOLOGGING. You can
still subsequently create a table as LOGGING in a tablespace you define as
NOLOGGING, all the NOLOGGING attribute does for a tablespace is set the default
for objects later created in the tablespace.

Once you’ve bulk loaded your data, created your table using CREATE TABLE …
AS SELECT or rebuilt your index, the important next step is to perform a
baseline backup of the datafiles that contain your objects; once you’ve done
this, and until such time as you carry out any other unrecoverable DML, you can
then recover your objects like any other. This baseline backup

can be a hot backup
(as long as you’re in ARCHIVELOG mode, which in our case
we are) and this backup then ensures that we have a baseline copy of the object
within our backup set if we then need to perform a full recovery.

So where does the ARCHIVELOG setting for databases come into things?

The general advice for setting up warehouse target databases is to put the
database into NOARCHIVELOG mode. NOARCHIVELOG suppresses the archiving of redo
log files and improves the performance of data loads as you don’t have the ARCn
processes running around in the background and holding things up as they copy
the redo log files off into your archive locations. The obvious cost of this is
that you will only have limited options when recovering your instance -
basically if you’ve overwritten your redo log files and not had the contents
archived off, and you have a media failure, you can only restore your instance
back to the last full cold backup you took (as the redo required to roll forward
from this point has probably been overwritten, and you didn’t archive it off
beforehand). However, for data warehouses, this isn’t usually an issue, as you
can just reload from the source files or databases, and the benefits of your
data load being quicker are usually greater than the (slim) chance that you’ll
need to perform a media recovery.

What we have here then are two separate concepts - ARCHIVELOG and
NOARCHIVELOG mode, which determines whether redo logs are archived off when they
fill up, or just overwritten and the redo then lost, and LOGGING | NOLOGGING,
where we either allow all redo to be generated as normal, or in certain
circumstances (the important ones, when we need to load lots of data) suppress
the majority of redo, at the cost of our operation being unrecoverable and
subsequently needing to reload the objects from the source files. These two
concepts interact in two ways:

  • If we’re looking to suppress as much redo as possible, chances are we’d
    like to suppress archiving as well (and we might as well, as the objects
    we’re loading are unrecoverable anyway), and
  • If you specify NOARCHIVELOG mode for a database, all indexes, tables and
    partitions are

    automatically set to NOLOGGING
    , and therefore all INSERT /*+ APPEND */
    and SQL*Loader direct load operations will automatically generate minimal
    redo.

However, in our instance, the DBAs want to keep the database in ARCHIVELOG
mode, so that they can manage the database in the same way as the OLTP databases
and ship the archivelogs off to their DR site to maintain their standby
databases. As I was reading through the various articles and questions on Asktom,
I noticed

this answer
which suggested that there could be some complications when
using NOLOGGING operations when you later on wanted to ship the archivelogs to a
standby database:

"when using a standby database for disaster recovery — you cannot
perform nologging operations without performing additional, manual work (in
fact, in 9i,
we can set a flag that says "ignore the request for nologging — log it
anyway" on the database to avoid this issue alltogether).

Please read the standby database documention — it does explain this in some
detail:"

The answer then points through to the

Oracle 8.1.6 documentation for Standby Database
which talks about the
problems you’ll get when you try and apply the (minimal) redo logs that were
written for our unrecoverable transactions to your standby database:

"In some SQL statements, the user has the option of specifying
the NOLOGGING clause, which indicates that the database operation is not
logged in the redo log file. Even though the user specifies the NOLOGGING
clause, a redo log record is still written to the redo log. However, when
the redo log file is transferred to the standby site and applied to the
standby database, a portion of the datafile is unusable and marked as being
unrecoverable. When you either activate the standby database, or open the
standby database with the read-only option, and attempt to read the range of
blocks that are marked as "UNRECOVERABLE," you will see error messages
similar to the following:

ORA-01578: ORACLE data block corrupted (file # 1, block # 2521)
ORA-01110: data file 1: '/vobs/oracle/dbs/stdby/tbs_1.f'
ORA-26040: Data block was loaded using the NOLOGGING option"

good, and certainly I can’t imagine our DBA chuckling to
himself when at 2.30 in the morning he’s called to recover our database and that
error message comes up. It’s not as bad as it seems actually, as the article
then goes on to explain how you recover from this situation, but the point that
Tom was trying to make is that NOLOGGING isn’t something you should do and then
tell no-one about - you should understand what it does, talk it through with the
DBA and have a plan in mind for when you do eventually need to perform a
recovery and work around the fact that you may have some objects that are
unrecoverable - if you’ve done your baseline backup after your bulk load, it
shouldn’t affect you, but if your data centre gets hit by a meteorite then it
may well affect the process of starting up your standby database. Given that
Standby Database is now Data Guard with 10g, I checked the 9i Data Guard
documentation (the 10g documentation seems to be offline at the moment) and
found this comment, which suggests that in fact we might want
to actually reconsider our use of NOLOGGING operations:

“For logical standby databases, when SQL apply operations
encounter a redo log record for an operation performed with the NOLOGGING clause, it skips over the record and continues applying
changes from later records. Later, if an attempt is made to access one of
the records that were updated with NOLOGGING in effect, the
following error is returned: ORA-01403 no data found

To recover after the NOLOGGING clause is
specified, re-create one or more tables from the primary database, as
described in
Section 9.1.6.

Note: In general, use of the NOLOGGING
clause is not recommended. Optionally, if you know in advance
that operations using the NOLOGGING clause will be
performed on certain tables in the primary database, you might
want to prevent the application of SQL statements associated
with these tables to the logical standby database by using the
DBMS_LOGSTDBY.SKIP procedure."

So, in summary, in our situation where we want to minimise the amount of redo
generated for our dimension and fact table loads, but we want to work in
ARCHIVELOG mode and ship our logs to a standby database, I’d say the best
approach would be to:

  • Keep the database in ARCHIVELOG mode
  • Create separate tablespaces, with their own datafiles, for the
    dimensions and facts that we are going to bulk load, and create these
    tablespaces as NOLOGGING
  • Create the tables to implement these dimensions and facts as normal
    (i.e. without the NOLOGGING clause, as this is already the default for their
    tablespaces)
  • Use INSERT /*+ APPEND */ to
    bulk load our data into these tables
  • Once the loads have completed, do a baseline hot backup of the datafiles
    that relate to the tables/tablespaces we’ve just loaded
  • At this point, our main database is completely recoverable.
  • If we need to recover the standby database, be prepared for the fact
    that some manual work might be needed to activate the standby database, and
    in fact be prepared for the DBA to actually stop us from using NOLOGGING
    through the use of the DBMS_LOGSTDBY.SKIP procedure.

Any comments?

Getting To The Bottom Of NOLOGGING, NOARCHIVELOG and Standby Databases

Posted on the May 28th, 2005. Read times

Source: Mark Rittman's Oracle Weblog [link]

One of the customers I work with is looking to us for advice on how to set
their data warehousing database up. They’re looking to use the Enterprise
Edition of the latest Oracle version (10.1.0.4) and are planning on using Oracle
Warehouse Builder 10g as the ETL tool. Each environment (dev, test, prod) will
have two database instances, one to hold the OWB Design repository (with an 8k
block size) and one to hold the target data warehouse and the OWB Runtime
Repository (with a 16K block size). Setting up the design repository instance
isn’t an issue, it’s just a regular database set up using the default database
template, but the target data warehouse instance is a bit more interesting as
we’re looking to minimize the amount of redo log generated during the warehouse
initial build and then refresh.

Most people are aware that you can specify a NOLOGGING clause when creating
tables, indexes, partitions and tablespaces which under certain circumstances
causes redo generation to be switched off when the object is loaded. You’re also
probably aware that the database itself can be placed in NOARCHIVELOG mode,
which disables the process of redo log archiving which can otherwise slow down a
bulk load into a warehouse. However, what’s often not clear is which of these
options should be used and why you would use one rather than the other, and I
therefore thought it worthwhile going back over the documentation and trying to
set out what the optimal set of choices are. As usual, if you’re reading this
and you spot something that I’ve got wrong, or can add anything that makes the
process clearer, add a comment at the end and I’ll update the posting.

To put this in context, the client runs a number of other Oracle databases,
almost exclusively as the backend for their transactional systems, and the
general rule is that these databases are run in ARCHIVELOG mode, incremental hot
backups are taken during the week using RMAN followed by full hot backups at the
weekend, and the archived redo logs are stored off site and also used to refresh
their DR databases. Given a preference, they would rather run the new data
warehouse database in ARCHIVELOG mode, so that it can fit in with their existing
arrangements, but they are willing to perform the warehouse load and refresh in
NOLOGGING mode as long as we give them some downtime after the load to do a
full, cold backup of the warehouse.

What we would like to do is to somehow disable redo log generation for
initial loads into our dimension tables, and for initial and then subsequent
loads into the fact tables. The rationale for this is that the initial dimension
load, and all loads into the fact table, load a very large amount of data and we
can speed up this process if we don’t have to write redo information into the
redo log files - in other words, we dramatically reduce the amount of disk I/O
we need to perform to load a given object. If our database is running in
ARCHIVELOG mode, we get an additional benefit in that we don’t then need to
archive off these redo log files, again reducing the time to perform the load as
we don’t have to wait around for the ARCn process to archive the filled-up redo
log files. The downside of doing this though is that these data loads are
unrecoverable - that is, if the instance crashes or we have a media (disk)
failure, we wouldn’t be able to recoverer these objects. Whilst that isn’t a bit
deal for the objects we’re working with - we can just reload them from the
source files - it means that we’re going to have problems with any other objects
that are also in that datafile. What this means in practice is that we have to
do a full, cold backup of the datafiles that contain these objects before we can
consider the database "recoverable".

I mentioned earlier on that you can specify a NOLOGGING clause when creating
tables, indexes, partitions and tablespaces. Ignoring tablespaces for the
moment, what this means in practice is that you can create the table (for
example) using the NOLOGGING clause:

CREATE TABLE "PRODUCT"
(
"CLASS_DSC" VARCHAR2(50),
"CLASS_ID" NUMBER,
"FAMILY_DSC" VARCHAR2(50),
"FAMILY_ID" NUMBER,
"ITEM_BUYER" VARCHAR2(50),
"ITEM_DSC" VARCHAR2(50),
"ITEM_ID" NUMBER,
"ITEM_MARKETING_MANAGER" VARCHAR2(50),
"TOTAL_PRODUCT_DSC" VARCHAR2(50),
"TOTAL_PRODUCT_ID" NUMBER)
NOLOGGING
;

Then, you can carry out bulk loads into this table in NOLOGGING mode. For
example:

INSERT
/*+ APPEND */
INTO
"PRODUCT"
("CLASS_DSC",
"CLASS_ID",
"FAMILY_DSC",
"FAMILY_ID",
"ITEM_BUYER",
"ITEM_DSC",
"ITEM_ID",
"ITEM_MARKETING_MANAGER",
"TOTAL_PRODUCT_DSC",
"TOTAL_PRODUCT_ID")
(SELECT
"PRODUCT_DIM"."CLASS_DSC" "CLASS_DSC",
"PRODUCT_DIM"."CLASS_ID" "CLASS_ID",
"PRODUCT_DIM"."FAMILY_DSC" "FAMILY_DSC",
"PRODUCT_DIM"."FAMILY_ID" "FAMILY_ID",
"PRODUCT_DIM"."ITEM_BUYER" "ITEM_BUYER",
"PRODUCT_DIM"."ITEM_DSC" "ITEM_DSC",
"PRODUCT_DIM"."ITEM_ID" "ITEM_ID",
"PRODUCT_DIM"."ITEM_MARKETING_MANAGER" "ITEM_MARKETING_MANAGER",
"PRODUCT_DIM"."TOTAL_PRODUCT_DSC" "TOTAL_PRODUCT_DSC",
"PRODUCT_DIM"."TOTAL_PRODUCT_ID" "TOTAL_PRODUCT_ID"
FROM "PRODUCT_DIM" "PRODUCT_DIM"
);

One point to note here is that the NOLOGGING clause doesn’t mean that
every
DML operations on the table works in NOLOGGING mode - it’s only bulk
operations like INSERT /*+ APPEND */ and Direct Loader (SQL*Loader) that take
advantage of it. Regular INSERTs, DELETEs and UPDATEs will generate normal redo,
even when you specify NOLOGGING. This is a popular misconception as logic would
suggest that the NOLOGGING table creation clause would suppress all redo log
generation, not just direct path loads, but all the NOLOGGING clause does is
specify NOLOGGING for the table creation process itself (in case you’re using
CREATE TABLE … AS SELECT) and for any subsequent direct path insertions.

Another misconception is that NOLOGGING suppresses all redo, even when using
direct path loads. What actually happens is that NOLOGGING operations generate
minimal redo, not no redo whatsoever, with this minimal redo generated so that
the data dictionary itself is preserved. It’s only a relatively small amount,
but it’s worth noting that some will be produced, whatever steps you take.

With indexes and partition creation, it’s a similar story - you can specify
NOLOGGING in the object creation clause, and then certain operations (in the
case of indexes, creation and ALTERs (rebuilds)) can take advantage of NOLOGGING.

So what about tablespaces? What do you get when you specify the NOLOGGING
clause when creating a tablespace?

CREATE TABLESPACE dims
NOLOGGING
DATAFILE ‘c:\oradata\dims01.dbf’ SIZE 50M
BLOCKSIZE 8192
EXTENT MANAGEMENT LOCAL UNIFORM SIZE 256K
SEGMENT SPACE MANAGEMENT AUTO
ONLINE;

What this actually does is ensure that the default LOGGING|NOLOGGING
attribute for objects created in this tablespace is set to NOLOGGING. You can
still subsequently create a table as LOGGING in a tablespace you define as
NOLOGGING, all the NOLOGGING attribute does for a tablespace is set the default
for objects later created in the tablespace.

Once you’ve bulk loaded your data, created your table using CREATE TABLE …
AS SELECT or rebuilt your index, the important next step is to perform a
baseline backup of the datafiles that contain your objects; once you’ve done
this, and until such time as you carry out any other unrecoverable DML, you can
then recover your objects like any other. This baseline backup

can be a hot backup
(as long as you’re in ARCHIVELOG mode, which in our case
we are) and this backup then ensures that we have a baseline copy of the object
within our backup set if we then need to perform a full recovery.

So where does the ARCHIVELOG setting for databases come into things?

The general advice for setting up warehouse target databases is to put the
database into NOARCHIVELOG mode. NOARCHIVELOG suppresses the archiving of redo
log files and improves the performance of data loads as you don’t have the ARCn
processes running around in the background and holding things up as they copy
the redo log files off into your archive locations. The obvious cost of this is
that you will only have limited options when recovering your instance -
basically if you’ve overwritten your redo log files and not had the contents
archived off, and you have a media failure, you can only restore your instance
back to the last full cold backup you took (as the redo required to roll forward
from this point has probably been overwritten, and you didn’t archive it off
beforehand). However, for data warehouses, this isn’t usually an issue, as you
can just reload from the source files or databases, and the benefits of your
data load being quicker are usually greater than the (slim) chance that you’ll
need to perform a media recovery.

What we have here then are two separate concepts - ARCHIVELOG and
NOARCHIVELOG mode, which determines whether redo logs are archived off when they
fill up, or just overwritten and the redo then lost, and LOGGING | NOLOGGING,
where we either allow all redo to be generated as normal, or in certain
circumstances (the important ones, when we need to load lots of data) suppress
the majority of redo, at the cost of our operation being unrecoverable and
subsequently needing to reload the objects from the source files. These two
concepts interact in two ways:

  • If we’re looking to suppress as much redo as possible, chances are we’d
    like to suppress archiving as well (and we might as well, as the objects
    we’re loading are unrecoverable anyway), and
  • If you specify NOARCHIVELOG mode for a database, all indexes, tables and
    partitions are

    automatically set to NOLOGGING
    , and therefore all INSERT /*+ APPEND */
    and SQL*Loader direct load operations will automatically generate minimal
    redo.

However, in our instance, the DBAs want to keep the database in ARCHIVELOG
mode, so that they can manage the database in the same way as the OLTP databases
and ship the archivelogs off to their DR site to maintain their standby
databases. As I was reading through the various articles and questions on Asktom,
I noticed

this answer
which suggested that there could be some complications when
using NOLOGGING operations when you later on wanted to ship the archivelogs to a
standby database:

"when using a standby database for disaster recovery — you cannot
perform nologging operations without performing additional, manual work (in
fact, in 9i,
we can set a flag that says "ignore the request for nologging — log it
anyway" on the database to avoid this issue alltogether).

Please read the standby database documention — it does explain this in some
detail:"

The answer then points through to the

Oracle 8.1.6 documentation for Standby Database
which talks about the
problems you’ll get when you try and apply the (minimal) redo logs that were
written for our unrecoverable transactions to your standby database:

"In some SQL statements, the user has the option of specifying
the NOLOGGING clause, which indicates that the database operation is not
logged in the redo log file. Even though the user specifies the NOLOGGING
clause, a redo log record is still written to the redo log. However, when
the redo log file is transferred to the standby site and applied to the
standby database, a portion of the datafile is unusable and marked as being
unrecoverable. When you either activate the standby database, or open the
standby database with the read-only option, and attempt to read the range of
blocks that are marked as "UNRECOVERABLE," you will see error messages
similar to the following:

ORA-01578: ORACLE data block corrupted (file # 1, block # 2521)
ORA-01110: data file 1: '/vobs/oracle/dbs/stdby/tbs_1.f'
ORA-26040: Data block was loaded using the NOLOGGING option"

good, and certainly I can’t imagine our DBA chuckling to
himself when at 2.30 in the morning he’s called to recover our database and that
error message comes up. It’s not as bad as it seems actually, as the article
then goes on to explain how you recover from this situation, but the point that
Tom was trying to make is that NOLOGGING isn’t something you should do and then
tell no-one about - you should understand what it does, talk it through with the
DBA and have a plan in mind for when you do eventually need to perform a
recovery and work around the fact that you may have some objects that are
unrecoverable - if you’ve done your baseline backup after your bulk load, it
shouldn’t affect you, but if your data centre gets hit by a meteorite then it
may well affect the process of starting up your standby database. Given that
Standby Database is now Data Guard with 10g, I checked the 9i Data Guard
documentation (the 10g documentation seems to be offline at the moment) and
found this comment, which suggests that in fact we might want
to actually reconsider our use of NOLOGGING operations if

“For logical standby databases, when SQL apply operations
encounter a redo log record for an operation performed with the NOLOGGING clause, it skips over the record and continues applying
changes from later records. Later, if an attempt is made to access one of
the records that were updated with NOLOGGING in effect, the
following error is returned: ORA-01403 no data found

To recover after the NOLOGGING clause is
specified, re-create one or more tables from the primary database, as
described in
Section 9.1.6.

e>



Note:

In general, use of the NOLOGGING
clause is not recommended. Optionally, if you know in advance
that operations using the NOLOGGING clause will be
performed on certain tables in the primary database, you might
want to prevent the application of SQL statements associated
with these tables to the logical standby database by using the
DBMS_LOGSTDBY.SKIP procedure."

, use of the NOLOGGING clause is not recommended. Optionally, if
you know in advance that operations using the NOLOGGING clause will be performed
on certain tables in the primary database, you might want to prevent the
application of SQL statements associated with these tables to the logical
standby database by using the DBMS_LOGSTDBY.SKIP procedure.
So, in summary, in our situation where we want to minimise the amount of redo
generated for our dimension and fact table loads, but we want to work in
ARCHIVELOG mode and ship our logs to a standby database, I’d say the best
approach would be to:

  • Keep the database in ARCHIVELOG mode
  • Create separate tablespaces, with their own datafiles, for the
    dimensions and facts that we are going to bulk load, and create these
    tablespaces as NOLOGGING
  • Create the tables to implement these dimensions and facts as normal
    (i.e. without the NOLOGGING clause, as this is already the default for their
    tablespaces)
  • Use INSERT /*+ APPEND */ to
    bulk load our data into these tables
  • Once the loads have completed, do a baseline hot backup of the datafiles
    that relate to the tables/tablespaces we’ve just loaded
  • At this point, our main database is completely recoverable.
  • If we need to recover the standby database, be prepared for the fact
    that some manual work might be needed to activate the standby database, and
    in fact be prepared for the DBA to actually stop us from using NOLOGGING
    through the use of the DBMS_LOGSTDBY.SKIP procedure.

Any comments?

SOA styles | InfoWorld | Column | 2005-05-25 | By Jon Udell

Posted on the May 27th, 2005. Read times

Source: OLAP/BI/IM stuff [link]

SOA styles | InfoWorld | Column | 2005-05-25 | By Jon Udell: “I don?t often emerge from a conference with a single overwhelming insight, but in this case that?s just what happened. The juxtaposition of the two events helped me reconcile a deep schism between two factions, which I?ll call the WS-* and Web 2.0 camps. The argument, which revolves around pairs of opposing and overloaded words ? simplicity vs. complexity, decentralization vs. centralization, agility vs. stability ? has been going on for years, but it?s gotten really loud in recent months. ”

I’ll be posting more SOA stuff in the future - this is an interesting debate… think the alphabet soup of WS-* (which hardly anyone knows all of), and the simpler, plain old XML via HTTP constructs, used by google, Yahoo, Amazon, and other mainstream companies. As a part-time hacker, I personally prefer the straight up HTTP options… Nothing like whipping up some cool perl script to make some magic.

In fact it seems to the only real place for WS-* is in orchestration…. if services will be consumed by their end-point, XML over HTTP is far simpler (and far faster). If you are automating a long process with several decision points though, and if it may change in the near future, straight up scripting may turn into a hinderance…

Sales Analytics: Driving Greater Sales Effectiveness and Increased Results

Posted on the May 26th, 2005. Read times

Source: ITPapers.com - Recent Business Intelligence / Data Warehousing White Papers [link]

Learn how to get more from your SFA system in this white paper from Siebel Systems. Find out how enhancing your system with analytics can help you go beyond standard sales reports to understand in real time the forces affecting sales outcomes.

Next Page »