天天看點

spark dataFrame withColumn

說明:withColumn用于在原有DF新增一列

1. 初始化sqlContext

val sqlContext = new org.apache.spark.sql.SQLContext(sc) 

2.導入sqlContext隐式轉換

import sqlContext.implicits._ 

3.  建立DataFrames

val df = sqlContext.read.json("file:///usr/local/spark-2.3.0/examples/src/main/resources/people.json")

4. 顯示内容

df.show()   

| age|   name| 

+----+-------+

|null|Michael|

|  30|   Andy|

|  19| Justin|

5. 為原有df新加一列

df.withColumn("id2", monotonically_increasing_id()+1) 

6. 顯示添加列後的内容

 res6.show() 

+----+-------+---+

| age|   name|id2|

|null|Michael|  1|

|  30|   Andy|  2|

|  19| Justin|  3|

完成的過程如下:

scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc) 

warning: there was one deprecation warning; re-run with -deprecation for details

sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@2513155a

scala> import sqlContext.implicits._

import sqlContext.implicits._

scala> val df = sqlContext.read.json("file:///usr/local/spark-2.3.0/examples/src/main/resources/people.json")

2018-06-25 18:55:30 WARN  ObjectStore:6666 - Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0

2018-06-25 18:55:30 WARN  ObjectStore:568 - Failed to get database default, returning NoSuchObjectException

2018-06-25 18:55:32 WARN  ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException

df: org.apache.spark.sql.DataFrame = [age: bigint, name: string]

scala> df.show()

| age|   name|

scala> df.withColumn("id2", monotonically_increasing_id()+1)

res6: org.apache.spark.sql.DataFrame = [age: bigint, name: string ... 1 more field]

scala> res6.show()