pandas可以通過loc和iloc來篩選元素,ix不推薦使用
data = {'AAA':[4,5,6,7], 'BBB':[10,20,30,40], 'CCC':[100,50,-30, -50]}
df = pd.DataFrame(data=data, index=['foo','bar','boo','kar']);df
pandas.iloc
Purely integer-location based indexing for selection by position.
.iloc[]
is primarily integer position based (from
to
length-1
of the axis), but may also be used with a boolean array.
#篩選第一行到第二行的資料,注意區間左閉右開
df.iloc[1:3]
pandas.loc
Access a group of rows and columns by label(s) or a boolean array.
#篩選索引從'bar'到'kar'的資料
df.loc['bar':'kar']
更多用法
使用callable indexing
data = {'AAA':[4,5,6,7], 'BBB':[10,20,30,40], 'CCC':[100,50,-30, -50]}
df2 = pd.DataFrame(data=data);df2
df2[~((df2.AAA <=6)&(df2.index.isin([0,2,4])))]
df1 = pd.DataFrame(np.random.randn(6,4),
index=list('abcdef'),
columns=list('ABCD'));df1
#篩選A列大于0的資料
df1.loc[lambda df:df.A >0:]
#篩選A,B兩列資料
df1.loc[:, lambda df:['A','B']]
#篩選第一列和第二列資料
df1.iloc[:, lambda df:[0,1]]
#篩選第1列資料
df1[lambda df:df.columns[0]]
在Series上使用callable indexing
#篩選A列大于0的資料
df1.A.loc[lambda s:s >0]
使用這些方法,可以鍊式篩選而不使用臨時變量
bb = pd.read_csv('data/baseball.csv', index_col='id')
(bb.groupby(['year','team']).sum().loc[lambda df: df.r > 100])