Name | Process overview | Remarks |
---|---|---|
Read CSV File | Partitions the data into halves until each partition is smaller than a certain amount, and processes the partitions with multiple threads parallelly. (The intermediate data might be output to temporary files depending to the size of data or the number of CPU cores.) |
|
Join | Processes the received input data in multiple threads parallelly, outputs required intermediate data to temporary files and then joins them.
The joined result is output in the unit of appropriately partitioned key groups. |
|
Aggregate | Processes and aggregates the received input data in multiple threads parallelly. (The intermediate data might be output to temporary files depending on the size of data.)
The aggregated result is output in the unit of appropriately partitioned groups. If no group key is specified, partitioning will not be performed. |
|
Sort | Processes and sorts the received input data in multiple threads parallelly. (The intermediate data might be output to temporary files depending on the size of data or the number of CPU cores.)
The sorted result is output without partitioning in order to assure its order. |
|
Write CSV File | Processes the received input data in multiple threads parallelly. The order of the result data is not assured. |
|