是什麼?
partitioner是RDD的一個屬性,預設值為None.可以被子類重寫
@transient val partitioner: Option[Partitioner] = None
有何作用?
決定RDD如何分區,就是具體的分區方式,測試一下
預設的partitioner是None
object RDDTest extends App{
val conf = new SparkConf().setAppName("wordcount").setMaster("local")
val sc = new SparkContext(conf)
val lines: RDD[String] = sc.textFile("D:\\tmp", 2)
println(lines.partitioner)//None
}
預設的partitioner是None
object RDDTest extends App{
val conf = new SparkConf().setAppName("wordcount").setMaster("local")
val sc = new SparkContext(conf)
private val rdd: RDD[Int] = sc.parallelize(Array(1, 2, 3))
println(rdd.partitioner)//None
}
如果是kv形式的RDD,可以重新分區,如下
object RDDTest extends App{
val conf = new SparkConf().setAppName("wordcount").setMaster("local")
val sc = new SparkContext(conf)
private val rdd: RDD[Int] = sc.parallelize(Array(1, 2, 3))
private val value: RDD[(Int, Int)] = rdd.map((x: Int) => (x, 1)).partitionBy(new HashPartitioner(3))
println(value.partitioner)//Some([email protected])
}