pyspark 用fit訓練資料集的時候出現"Params must be either a param map or a list/tuple of param maps,

2023-05-19 21:47:56

在anaconda用決策樹訓練資料，

from pyspark.ml.classification import DecisionTreeClassifier
dt=DecisionTreeClassifier(labelCol="label",featuresCol="features")
dt_model=dt.fit(dfff)

出現錯誤

沒辦法，最後通過谷歌搜尋，最後猜測可能是當我用cast轉換資料類型時，導緻某些資料不能轉換進而變成null。

然後再轉換類型之後，用dff.dropna()。用dfff.count()檢視資料行數，果然比之前少了10條資料。

再執行上面的代碼，沒報錯。

df=df.select([df[col].cast("double") for col in df.columns])
df=df.dropna()

執行下面代碼。出現結果

print(dt_model)

DecisionTreeClassificationModel (uid=DecisionTreeClassifier_49c7b461375198d98f68) of depth 5 with 59 nodes

檢視下面的錯誤消息，會發現有這麼一句話：

Caused by: org.apache.spark.SparkException: Values to assemble cannot be null.

繼續閱讀