To write efficient Transformer stage derivations, it helps to understand what items get evaluated and when.
The evaluation sequence for a Transformer stage is:
Evaluate each stage variable initial value
For each input row to process:
Evaluate each stage variable derivation value, unless
the derivation is empty
For each output link:
Evaluate constraint, if true:
Evaluate each column derivation value
Write the output row
Next output link
Next input row
The evaluation sequence for a Transformer stage that has a loop condition defined is:
Evaluate each stage variable initial value
For each input row to process:
Evaluate each stage variable derivation value (unless empty)
Evaluate each loop variable initial value
While the evaluated loop condition is true:
Evaluate each loop variable derivation value (unless empty)
For each output link:
Evaluate constraint, if true:
Evaluate each column derivation value
Write the output row
Next output link
Loop back to While
Next input row
The stage variables, loop variables, and the columns within a link are evaluated in the order in which they are displayed on the parallel job canvas. Similarly, the output links are also evaluated in the order in which they are displayed.
Examples
Certain constructs are inefficient if they are included in output column derivations, because they are evaluated once for every output column that uses them. The following examples describe these constructs:
- The same part of an expression is used in multiple column derivations.
- For example, if you want to use the same substring of an input column in multiple columns in output links, you might use the following test in a number of output columns derivations:
In this case, the evaluation of the substring of DSLINK1.col1[1,3] is repeated for each column that uses it. The evaluation can be made more efficient by moving the substring calculation into a stage variable. The substring is then evaluated once for every input row. This example has thus stage variable definition for StageVar1:IF (DSLINK1.col1[1,3] = "001") THEN ...
Each column derivation starts with this test:DSLINK1.col1[1,3]
This example can be improved further by also moving the string comparison into the stage variable. The stage variable is then defined as follows:IF (StageVar1 = "001") THEN ...
Each column derivation starts with this test:IF (DSLink1.col1[1,3] = "001") THEN 1 ELSE 0
The improved construct reduces both substring function evaluations and string comparisons.IF (StageVar1) THEN
- An expression includes calculated constant values.
- For example, a column definition might include a function call that returns a constant value:
This function returns a string of 20 spaces. The function is evaluated every time the column derivation is evaluated. It is more efficient to calculate the constant value once. You can assign an initial value to a stage variable in the Variables tab of the Stage Properties window. The initial value is set to using this expression:Str(" ",20)
You do not supply the derivation of the stage variable on the main Transformer page. The initial value of the stage variable is evaluated once, before any input rows are processed. Because the derivation expression of the stage variable is empty, the stage variable is not re-evaluated for each input row. Change any expression that previously used the function Str(" ", 20) to use the stage variable instead.Str(" ", 20)
- The same considerations apply to any expression, or part of an expression, that generates a constant value. For example, the following expression concatenates two strings:
The abcdef concatenation is repeated every time the column derivation is evaluated. Since the subpart of the expression is constant, this constant part of the expression can again be moved into a stage variable, using the initial value setting to perform the concatenation once."abc" : "def"
- An expression requiring a type conversion is used as a constant, or it is used in multiple places
- For example, an expression might include the following code:
In this example, the "1" is a string constant, and so must be converted from a string to an integer each time the expression is evaluated. The solution in this case is to change the constant from a string to an integer:DSLink1.col1+"1"
If DSLINK1.col1 is a string field, however, then a conversion is required every time the expression is evaluated. If an input column is used in more than one expression, where it requires the same type conversion in each expression, it is more efficient to use a stage variable to perform the conversion once. You can create, for this example, an integer stage variable, specify its derivation to be DSLINK1.col1, and then use the stage variable in place of DSLink1.col1, where that conversion is required. When you use stage variables to evaluate parts of expressions, you must set the data type of the stage variable correctly for that context. Otherwise, needless conversions are required wherever that variable is used.DSLink1.col1+1
No comments:
Post a Comment