天天看點

對比SQL查詢語句與Pandas文法(SQL vs Pandas)---基礎篇

某些SQL查詢語句與Pandas語句可以互相轉換。

以下表(命名為df)為例:

sex   tip  total_bill
0  Female  1.01       16.99
1    Male  1.66       10.34
2    Male  3.50       23.68
3    Male  3.31       23.68
4  Female  3.61       24.59      
import pandas as pd
import numpy as np

df = pd.DataFrame({'total_bill': [16.99, 10.34, 23.68, 23.68, 24.59],
                   'tip': [1.01, 1.66, 3.50, 3.31, 3.61],
                   'sex': ['Female', 'Male', 'Male', 'Male', 'Female']})      

1,選擇

SQL:SELECT ... FROM ...

SELECT total_bill, tip 
FROM df      

Pandas:loc, iloc

df.loc[:, ['total_bill', 'tip']]
      

2,條件過濾

SQL:WHERE

SELECT * 
FROM df 
WHERE total_bill > 20      

Pandas:df[df[colunm] boolean expr],.query()

df[df['total_bill'] > 20]      
df.query('total_bill > 20')      

3,限制行數

SQL:LIMIT

SELECT total_bill, tip 
FROM df 
LIMIT 3      

Pandas:loc, iloc,.head()

df.loc[:2, ['total_bill', 'tip']]      
df[['total_bill', 'tip']].head(3)      

4,跳過指定行數

SQL:OFFSET

SELECT total_bill, tip 
FROM df 
LIMIT 3 OFFSET 1      

Pandas:loc, iloc,.tail()

df.loc[1:3, ['total_bill', 'tip']]      
df.loc[:3, ['total_bill', 'tip']].tail(3)      

5,不重複的值

SQL:DISTINCT

SELECT DISTINCT sex 
FROM df       

Python:.unique()

df['sex'].unique()      

6,條件連接配接

SQL:AND,OR

SELECT * 
FROM df
WHERE sex = "Female" 
    AND tip > 2      

Pandas:&,|

df[(df['sex']=='Female') & (df['tip']>2)]      

7,排序

SQL:ORDER BY

SELECT * 
FROM df 
ORDER BY total_bill DESC, tip      

Pandas:.sort_values()

df.sort_values(['total_bill', 'tip'], ascending=[False, True])      

8,在/不在...裡面

SQL:IN,NOT IN

SELECT * 
FROM df 
WHERE tip NOT IN (1.66, 3.31)      

Pandas:.isin(),~

df[~df['tip'].isin([1.66, 3.31])]      

9,是否是空值

SQL:IS NULL,IS NOT NULL

SELECT total_bill 
FROM df 
WHERE total_bill IS NOT NULL      

Pandas:.isnull(),.notnull()

df.total_bill[df['total_bill'].notnull()]      

10,分組

SQL:GROUP BY

SELECT sex, COUNT(tip) 
FROM df 
GROUP BY sex      

Pandas:.groupby()

df.groupby('sex')['tip'].count()      

11,計數

SQL:COUNT

SELECT sex, COUNT(tip) 
FROM df 
GROUP BY sex      

Pandas:.count(), .size()

df.groupby('sex')['tip'].count()      

注:.count()傳回非空字元出現的次數,相當于SQL中的COUNT(col_name);而.size()傳回所有行數,相當于SQL中的COUNT(*)。

12,對分組結果進行條件過濾

SQL:HAVING

SELECT sex, COUNT(tip) 
FROM df 
GROUP BY sex 
HAVING COUNT(tip) > 2      

Pandas:

temp=df.groupby("sex")["tip"].count()
temp[temp>2]      

13,聚合函數

SQL: MIN,MAX,AVG,SUM

SELECT sex, MAX(tip), SUM(total_bill) 
FROM df 
GROUP BY sex      

Pandas:.agg()

df.groupby('sex').agg({'tip': np.max, 'total_bill': np.sum})      

14,表連接配接

SQL:JOIN,LEFT JOIN,RIGHT JOIN

Pandas:.merge()

15, 表并集

SQL:UNION,UNION ALL

Pandas:pd.concat() + .drop_dupplicates(),pd.concat()

繼續閱讀