Monday, 27 January 2014

sort stage - Resorting on Sub-Groups

Basics - Sort stage Basics  <--- Link

Resorting on Sub-Groups

Use Sort Key Mode property to re-use key column groupings from previous sorts
              – Uses significantly less memory / disk!
                       • Sort is now on previously-sorted key-column groups not the entire dataset
                       • Outputs rows after each group

Key column order is important!
              – Must be consistent across sort stages to be able to sub-sort on the same keys

Must retain incoming sort order and partitioning (SAME) between the sort stages




Don’t Sort (Previously Grouped)
What’s the difference between “Don’t Sort (Previously sorted)” and “Don’t Sort (Previously grouped)”?

When rows were previously grouped by a key, all the rows with the same key value are grouped together.
              – But the groups of rows are not necessarily in sort order.

When rows are previously sorted by a key, all the rows are grouped together and, moreover, the groups are in sort order.

In either case the Sort stage can be used to sort by a sub-key within each group of rows



No comments:

Post a Comment