天天看點

【spark】group\groupBy

groupBy(function)

function傳回key,傳入的RDD的各個元素根據這個key進行分組

val a = sc.parallelize( to , )
a.groupBy(x => { if (x %  == ) "even" else "odd" }).collect//分成兩組
/*結果 
Array(
(even,ArrayBuffer(2, 4, 6, 8)),
(odd,ArrayBuffer(1, 3, 5, 7, 9))
)
*/           
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
val a = sc.parallelize( to , )
def myfunc(a: Int) : Int =
{
  a % //分成兩組
}
a.groupBy(myfunc).collect           
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

groupByKey( )

val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "spider", "eagle"), )
val b = a.keyBy(_.length)//給value加上key,key為對應string的長度
b.groupByKey.collect
//結果 Map((4,ArrayBuffer(lion)), (6,ArrayBuffer(spider)), (3,ArrayBuffer(dog, cat)), (5,ArrayBuffer(tiger, eagle)

           

轉自:http://blog.csdn.net/guotong1988/article/details/50556871

繼續閱讀