1.資料說明
有這樣的一組資料:Age(年齡)是浮點數;New_Salutation(稱謂)有五個取值——Mr,Mrs,Miss,Master,Other;Pclass(級别)有三個取值——1,2,3;Sex(性别)有兩個取值——male,female。具體資料内容見下:
>>>df[["Age","New_Salutation","Pclass","Sex"]]
Age New_Salutation Pclass Sex
0 22.000000 Mr 3 male
1 38.000000 Mrs 1 female
2 26.000000 Miss 3 female
3 35.000000 Mrs 1 female
4 35.000000 Mr 3 male
5 29.699118 Mr 3 male
6 54.000000 Mr 1 male
7 2.000000 Master 3 male
8 27.000000 Mrs 3 female
2.pivot_table使用示例
利用pandas庫的pivot_table方法進行分析,使用方法如下。
pivot_table參數含義:
values:透視表中展示的是有關于Age的數值
index:按New_Salutation的五個取值(Mr,Mrs,Miss,Master,Other)進行索引排序
columns:先按照Pclass的三個取值(1,2,3)分成分成三組,每組中再按照Sex的取值(male,female)分成兩組,一共是六組。也可以隻填Pclass一個值,則隻分成三組,不在繼續細分。
aggfunc:透視表中的數值展示的是每組關于Age的均值
對于New_Salutation取值為Master,Pclass為1,Sex為Male的這些人,他們的Age均值是4.0,見下文資料中的标紅數字
>>>table=df.pivot_table(values="Age",index=["New_Salutation"],columns=["Pclass","Sex"],aggfunc=np.median)
輸出結果為:
Pclass 1 2 3
Sex female male female male female male
New_Salutation
Master NaN 4.0 NaN 1.0 NaN 6.500000
Miss 30.0 NaN 24.0 NaN 22.000000 NaN
Mr NaN 36.0 NaN 30.0 NaN 29.699118
Mrs 38.5 NaN 32.0 NaN 29.699118 NaN
Others 28.5 47.0 28.0 46.5 NaN NaN