Basics - Sort stage Basics <--- Link
Resorting on Sub-Groups
Use Sort Key Mode property to re-use key column groupings from previous sorts
– Uses significantly less memory / disk!
• Sort is now on previously-sorted key-column groups not the entire dataset
• Outputs rows after each group
Key column order is important!
– Must be consistent across sort stages to be able to sub-sort on the same keys
Must retain incoming sort order and partitioning (SAME) between the sort stages
Don’t Sort (Previously Grouped)
What’s the difference between “Don’t Sort (Previously sorted)” and “Don’t Sort (Previously grouped)”?
When rows were previously grouped by a key, all the rows with the same key value are grouped together.
– But the groups of rows are not necessarily in sort order.
When rows are previously sorted by a key, all the rows are grouped together and, moreover, the groups are in sort order.
In either case the Sort stage can be used to sort by a sub-key within each group of rows
Resorting on Sub-Groups
Use Sort Key Mode property to re-use key column groupings from previous sorts
– Uses significantly less memory / disk!
• Sort is now on previously-sorted key-column groups not the entire dataset
• Outputs rows after each group
Key column order is important!
– Must be consistent across sort stages to be able to sub-sort on the same keys
Must retain incoming sort order and partitioning (SAME) between the sort stages
Don’t Sort (Previously Grouped)
What’s the difference between “Don’t Sort (Previously sorted)” and “Don’t Sort (Previously grouped)”?
When rows were previously grouped by a key, all the rows with the same key value are grouped together.
– But the groups of rows are not necessarily in sort order.
When rows are previously sorted by a key, all the rows are grouped together and, moreover, the groups are in sort order.
In either case the Sort stage can be used to sort by a sub-key within each group of rows
No comments:
Post a Comment