[Spark][Python]DataFrame的左右連接配接例子
$ hdfs dfs -cat people.json
$ hdfs dfs -cat pcodes.json
$pyspark
sqlContext = HiveContext(sc)
peopleDF = sqlContext.read.json("people.json")
peopleDF.limit(5).show()


pcodesDF = sqlContext.read.json("pcodes.json")
pcodesDF.limit(5).show()


mydf000 = peopleDF.join(pcodesDF,"pcode")
mydf000.limit(5).show()


mydf001=peopleDF.join(pcodesDF,"pcode","leftsemi")
mydf001.limit(5).show()


mydf002=peopleDF.join(pcodesDF,"pcode","left_outer")
mydf002.limit(5).show()


mydf003=peopleDF.join(pcodesDF,"pcode","right_outer")
mydf003.limit(5).show()


<a></a>
本文轉自健哥的資料花園部落格園部落格,原文連結:http://www.cnblogs.com/gaojian/p/7633001.html,如需轉載請自行聯系原作者