Monday, 27 January 2014

Partitioner insertion, sort insertion

Partitioner insertion and sort insertion each make writing a flow easier by alleviating the need for a user to think about either partitioning or sorting data. By examining the requirements of operators in the flow, the parallel engine can insert partitioners, collectors and sorts as necessary within a dataflow.
However, there are some situations where these features can be a hindrance.
If data is pre-partitioned and pre-sorted, and the InfoSphere® DataStage® job is unaware of this, you could disable automatic partitioning and sorting for the whole job by setting the following environment variables while the job runs:
  • APT_NO_PART_INSERTION
  • APT_NO_SORT_INSERTION
You can also disable partitioning on a per-link basis within your job design by explicitly setting a partitioning method of Same on the Input pagePartitioning tab of the stage the link is input to.
To disable sorting on a per-link basis, insert a Sort stage on the link, and set the Sort Key Mode option to Don't Sort (Previously Sorted).
We advise that average users leave both partitioner insertion and sort insertion alone, and that power users perform careful analysis before changing these options.


No comments:

Post a Comment