某些SQL查詢語句與Pandas語句可以互相轉換。
以下表(命名為df)為例:
sex tip total_bill
0 Female 1.01 16.99
1 Male 1.66 10.34
2 Male 3.50 23.68
3 Male 3.31 23.68
4 Female 3.61 24.59
import pandas as pd
import numpy as np
df = pd.DataFrame({'total_bill': [16.99, 10.34, 23.68, 23.68, 24.59],
'tip': [1.01, 1.66, 3.50, 3.31, 3.61],
'sex': ['Female', 'Male', 'Male', 'Male', 'Female']})
1,選擇
SQL:SELECT ... FROM ...
SELECT total_bill, tip
FROM df
Pandas:loc, iloc
df.loc[:, ['total_bill', 'tip']]
2,條件過濾
SQL:WHERE
SELECT *
FROM df
WHERE total_bill > 20
Pandas:df[df[colunm] boolean expr],.query()
df[df['total_bill'] > 20]
df.query('total_bill > 20')
3,限制行數
SQL:LIMIT
SELECT total_bill, tip
FROM df
LIMIT 3
Pandas:loc, iloc,.head()
df.loc[:2, ['total_bill', 'tip']]
df[['total_bill', 'tip']].head(3)
4,跳過指定行數
SQL:OFFSET
SELECT total_bill, tip
FROM df
LIMIT 3 OFFSET 1
Pandas:loc, iloc,.tail()
df.loc[1:3, ['total_bill', 'tip']]
df.loc[:3, ['total_bill', 'tip']].tail(3)
5,不重複的值
SQL:DISTINCT
SELECT DISTINCT sex
FROM df
Python:.unique()
df['sex'].unique()
6,條件連接配接
SQL:AND,OR
SELECT *
FROM df
WHERE sex = "Female"
AND tip > 2
Pandas:&,|
df[(df['sex']=='Female') & (df['tip']>2)]
7,排序
SQL:ORDER BY
SELECT *
FROM df
ORDER BY total_bill DESC, tip
Pandas:.sort_values()
df.sort_values(['total_bill', 'tip'], ascending=[False, True])
8,在/不在...裡面
SQL:IN,NOT IN
SELECT *
FROM df
WHERE tip NOT IN (1.66, 3.31)
Pandas:.isin(),~
df[~df['tip'].isin([1.66, 3.31])]
9,是否是空值
SQL:IS NULL,IS NOT NULL
SELECT total_bill
FROM df
WHERE total_bill IS NOT NULL
Pandas:.isnull(),.notnull()
df.total_bill[df['total_bill'].notnull()]
10,分組
SQL:GROUP BY
SELECT sex, COUNT(tip)
FROM df
GROUP BY sex
Pandas:.groupby()
df.groupby('sex')['tip'].count()
11,計數
SQL:COUNT
SELECT sex, COUNT(tip)
FROM df
GROUP BY sex
Pandas:.count(), .size()
df.groupby('sex')['tip'].count()
注:.count()傳回非空字元出現的次數,相當于SQL中的COUNT(col_name);而.size()傳回所有行數,相當于SQL中的COUNT(*)。
12,對分組結果進行條件過濾
SQL:HAVING
SELECT sex, COUNT(tip)
FROM df
GROUP BY sex
HAVING COUNT(tip) > 2
Pandas:
temp=df.groupby("sex")["tip"].count()
temp[temp>2]
13,聚合函數
SQL: MIN,MAX,AVG,SUM
SELECT sex, MAX(tip), SUM(total_bill)
FROM df
GROUP BY sex
Pandas:.agg()
df.groupby('sex').agg({'tip': np.max, 'total_bill': np.sum})
14,表連接配接
SQL:JOIN,LEFT JOIN,RIGHT JOIN
Pandas:.merge()
15, 表并集
SQL:UNION,UNION ALL
Pandas:pd.concat() + .drop_dupplicates(),pd.concat()