Spark 學習筆記可以follow這裡:https://github.com/MachineLP/Spark-
Types of spark operations
There are Three types of operations on RDDs: Transformations, Actions and Shuffles.
- The most expensive operations are those the require communication between nodes.
Transformations: RDD → RDD.
- Examples map, filter, sample, More
- No communication needed.
Actions: RDD → Python-object in head node.
- Examples: reduce, collect, count, take, More
- Some communication needed.
Shuffle