背景 從mysql 導出的資料儲存為csv檔案
導出兩個檔案,一個有csv的檔案頭 record1.csv
一個沒有csv檔案的檔案頭 record0.csv
try1
建表語句直接導入
sql='''
CREATE TABLE IF NOT EXISTS default.records
(
exer_recore_id BIGINT,
user_id int,
channel tinyint,
item_id int,
question_id int,
`type` tinyint,
question_num int,
orgin tinyint,
orgin_id int,
cate_id int,
is_hf tinyint,
user_answer string,
correct_answer string,
is_correct tinyint comment "正确與否",
dutration int,
record_id int,
subject_1 int,
subject_2 int,
subject_3 int,
status tinyint,
is_signed tinyint,
create_time int,
practice_level_1 string,
practice_level_2 string,
practice_level_3 string,
update_time int
)
comment "做題記錄"
'''
spark.sql("LOAD DATA LOCAL INPATH '*/Desktop/record0.csv' OVERWRITE INTO TABLE "
"records")
df=spark.sql("select exer_recore_id,user_id,is_correct from records ")
df.show()
df.printSchema()
輸出
+--------------+-------+----------+
|exer_recore_id|user_id|is_correct|
+--------------+-------+----------+
| null| null| null|
| null| null| null|
| null| null| null|
| null| null| null|
| null| null| null|
導入失敗
try 2
建表語句加入 分隔符
comment "做題記錄"
ROW FORMAT delimited fields terminated by ','
-
res 失敗
加入了csv 的分隔符’,'依舊失敗
文本檢視檔案
如下圖
try3
comment "做題記錄"
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = ",",
"quoteChar" = "'"
)
success
下一個問題(表頭不導入)
加入屬性tblproperties(“skip.header.line.count”=“1”)
‘1’ 代表忽略第一行
comment "做題記錄"
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = ",",
"quoteChar" = "'"
)
tblproperties("skip.header.line.count"="1")
解決
again
大意失荊州啊
剛才的輸出
+------------------+-------+----------+---------+
| exer_recore_id|user_id|is_correct|record_id|
+------------------+-------+----------+---------+
|"1001110000021295"| "11"| "0"| "109"|
|"1001110000021296"| "11"| "0"| "109"|
|"1001110000021297"| "11"| "1"| "109"|
root
|-- exer_recore_id: string (nullable = true)
|-- user_id: string (nullable = true)
|-- is_correct: string (nullable = true)
|-- record_id: string (nullable = true)
全是字元串格式,還帶着“
改
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = ",",
"quoteChar" = "\\"",
"escapeChar" = "\\\\"
)
tblproperties("skip.header.line.count"="1")
分隔符是
"
輸出
+----------------+-------+----------+---------+
| exer_recore_id|user_id|is_correct|record_id|
+----------------+-------+----------+---------+
|1001110000021295| 11| 0| 109|
|1001110000021296| 11| 0| 109|