有的時候會經常遇到條件過濾的場景,這個時候可能經常使用isin或者是~來進行一步操作,而不是寫條件語句的方式,這樣來提高效率和簡潔度。
1、直接根據條件進行索引,isin()接受一個清單,判斷該列中元素是否在清單中
import numpy as np
import pandas as pd
df=pd.DataFrame(np.random.randn(4,4),columns=['A','B','C','D'])
df
Out[189]:
A B C D
0 0.289595 0.202207 -0.850390 0.197016
1 0.403254 -1.287074 0.916361 0.055136
2 -0.359261 -1.266615 -0.733625 -0.790208
3 0.164862 -0.649637 0.716620 1.447703
df['E'] = ['aa', 'bb', 'cc', 'cc']
df
Out[191]:
A B C D E
0 0.289595 0.202207 -0.850390 0.197016 aa
1 0.403254 -1.287074 0.916361 0.055136 bb
2 -0.359261 -1.266615 -0.733625 -0.790208 cc
3 0.164862 -0.649637 0.716620 1.447703 cc
df.E.isin(['aa','cc'])
Out[192]:
0 True
1 False
2 True
3 True
Name: E, dtype: bool
df[df.E.isin(['aa','cc'])]
Out[193]:
A B C D E
0 0.289595 0.202207 -0.850390 0.197016 aa
2 -0.359261 -1.266615 -0.733625 -0.790208 cc
3 0.164862 -0.649637 0.716620 1.447703 cc
2、根據多條件進行索引,此時用&(交集)或者|(并集)進行連接配接
df[df.E.isin(['aa'])|df.E.isin(['cc'])]
Out[194]:
A B C D E
0 0.289595 0.202207 -0.850390 0.197016 aa
2 -0.359261 -1.266615 -0.733625 -0.790208 cc
3 0.164862 -0.649637 0.716620 1.447703 cc
df[df.E.isin(['aa'])]
Out[195]:
A B C D E
0 0.289595 0.202207 -0.85039 0.197016 aa
3、通過字典的形式傳遞多個條件{‘某列’:[條件],‘某列’:[條件],}
df['D'] = [1,2,3,4]
df[df.isin({'D':[0,3],'E':['aa','cc']})]
Out[200]:
A B C D E
0 NaN NaN NaN NaN aa
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN 3.0 cc
3 NaN NaN NaN NaN cc
4、~相當于isnotin
df[~(df.E=='cc')]
Out[202]:
A B C D E
0 0.289595 0.202207 -0.850390 1 aa
1 0.403254 -1.287074 0.916361 2 bb