天天看點

spark雜記:Operations on (key,val) RDDs

Spark 學習筆記可以follow這裡:https://github.com/MachineLP/Spark-

Types of spark operations

There are Three types of operations on RDDs: Transformations, Actions and Shuffles.

  • The most expensive operations are those the require communication between nodes.

Transformations: RDD → RDD.

  • Examples map, filter, sample, More
  • No communication needed.

Actions: RDD → Python-object in head node.

  • Examples: reduce, collect, count, take, More
  • Some communication needed.

Shuffle

繼續閱讀