groupBy(function)
function返回key,传入的RDD的各个元素根据这个key进行分组
val a = sc.parallelize( to , )
a.groupBy(x => { if (x % == ) "even" else "odd" }).collect//分成两组
/*结果
Array(
(even,ArrayBuffer(2, 4, 6, 8)),
(odd,ArrayBuffer(1, 3, 5, 7, 9))
)
*/
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
val a = sc.parallelize( to , )
def myfunc(a: Int) : Int =
{
a % //分成两组
}
a.groupBy(myfunc).collect
- 1
- 2
- 3
- 4
- 5
- 6
groupByKey( )
val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "spider", "eagle"), )
val b = a.keyBy(_.length)//给value加上key,key为对应string的长度
b.groupByKey.collect
//结果 Map((4,ArrayBuffer(lion)), (6,ArrayBuffer(spider)), (3,ArrayBuffer(dog, cat)), (5,ArrayBuffer(tiger, eagle)
转自:http://blog.csdn.net/guotong1988/article/details/50556871