約定
import pandas as pd
from pandas import DataFrame
import numpy as np
DataFrame
DataFrame是一個表格型的資料結構,既有行索引(儲存在index)又有列索引(儲存在columns)。
一、DataFrame對象常用屬性:
- 建立DateFrame方法有很多(後面再介紹),最常用的是直接傳入一個由等長清單或Numpy組成的字典:
dict1={"Province":["Guangdong","Beijing","Qinghai","Fujiang"],
"year":[]*,
"pop":[,,,]}
df1=DataFrame(dict1)
df1
代碼結果:
Province | pop | year | |
---|---|---|---|
Guangdong | 1.3 | 2018 | |
1 | Beijing | 2.5 | 2018 |
2 | Qinghai | 1.1 | 2018 |
3 | Fujiang | 0.7 | 2018 |
- 同Series一樣,也可在建立時指定序列(對于字典中缺失的用NaN):
df2=DataFrame(dict1,columns=['year','Province','pop','debt'],index=['one','two','three','four'])
df2
代碼結果:
year | Province | pop | debt | |
---|---|---|---|---|
one | 2018 | Guangdong | 1.3 | NaN |
two | 2018 | Beijing | 2.5 | NaN |
three | 2018 | Qinghai | 1.1 | NaN |
four | 2018 | Fujiang | 0.7 | NaN |
- 同Series一樣,DataFrame的index和columns有name屬性:
df2
代碼結果:
year | Province | pop | debt | |
---|---|---|---|---|
one | 2018 | Guangdong | 1.3 | NaN |
two | 2018 | Beijing | 2.5 | NaN |
three | 2018 | Qinghai | 1.1 | NaN |
four | 2018 | Fujiang | 0.7 | NaN |
df2.index.name='English'
df2.columns.name='Province'
df2
代碼結果:
Province | year | Province | pop | debt |
---|---|---|---|---|
English | ||||
one | 2018 | Guangdong | 1.3 | NaN |
two | 2018 | Beijing | 2.5 | NaN |
three | 2018 | Qinghai | 1.1 | NaN |
four | 2018 | Fujiang | 0.7 | NaN |
- 通過shape屬性擷取DataFrame的行數和列數:
df2.shape
代碼結果:
(4, 4)
- values屬性也會以二維ndarray的形式傳回DataFrame的資料:
df2.values
代碼結果:
array([[2018, 'Guangdong', 1.3, nan],
[2018, 'Beijing', 2.5, nan],
[2018, 'Qinghai', 1.1, nan],
[2018, 'Fujiang', 0.7, nan]], dtype=object)
- 列索引會作為DataFrame對象的屬性:
df2.Province
代碼結果:
English
one Guangdong
two Beijing
three Qinghai
four Fujiang
Name: Province, dtype: object
二、DataFrame對象常見存取、指派和删除方式:
- DataFrame_object[ ] 能通過列索引來存取,當隻有一個标簽則傳回Series,多于一個則傳回DataFrame:
df2['Province']
代碼結果: English one Guangdong two Beijing three Qinghai four Fujiang Name: Province, dtype: object
df2[['Province','pop']]
代碼結果:
Province | Province | pop |
---|---|---|
English | ||
one | Guangdong | 1.3 |
two | Beijing | 2.5 |
three | Qinghai | 1.1 |
four | Fujiang | 0.7 |
- DataFrame_object.loc[ ] 能通過行索引來擷取指定行:
df2.loc['one']
代碼結果:
Province
year 2018
Province Guangdong
pop 1.3
debt NaN
Name: one, dtype: object
df2.loc['one':'three']
代碼結果:
Province | year | Province | pop | debt |
---|---|---|---|---|
English | ||||
one | 2018 | Guangdong | 1.3 | NaN |
two | 2018 | Beijing | 2.5 | NaN |
three | 2018 | Qinghai | 1.1 | NaN |
- 還可以擷取單值:
df2.loc['one','Province']
代碼結果:
'Guangdong'
- DataFrame的列可以通過指派(一個值或一組值)來修改:
df2["debt"]=np.arange(,,)
df2
代碼結果:
Province | year | Province | pop | debt |
---|---|---|---|---|
English | ||||
one | 2018 | Guangdong | 1.3 | 2.00 |
two | 2018 | Beijing | 2.5 | 2.25 |
three | 2018 | Qinghai | 1.1 | 2.50 |
four | 2018 | Fujiang | 0.7 | 2.75 |
- 為不存在的列指派會建立一個新的列,可通過del來删除:
df2['eastern']=df2.Province=='Guangdong'
df2
代碼結果:
Province | year | Province | pop | debt | eastern |
---|---|---|---|---|---|
English | |||||
one | 2018 | Guangdong | 1.3 | 2.00 | True |
two | 2018 | Beijing | 2.5 | 2.25 | False |
three | 2018 | Qinghai | 1.1 | 2.50 | False |
four | 2018 | Fujiang | 0.7 | 2.75 | False |
del df2['eastern']
df2.columns
代碼結果:
Index(['year', 'Province', 'pop', 'debt'], dtype='object', name='Province')
- 當然,還可以轉置:
df2.T
English | one | two | three | four |
---|---|---|---|---|
Province | ||||
year | 2018 | 2018 | 2018 | 2018 |
Province | Guangdong | Beijing | Qinghai | Fujiang |
pop | 1.3 | 2.5 | 1.1 | 0.7 |
debt | 2 | 2.25 | 2.5 | 2.75 |
三、多種建立DataFrame方式
- 調用DataFrame()可以将多種格式的資料轉換為DataFrame對象,它的的三個參數data、index和columns分别為資料、行索引和列索引。data可以是:
1 二維數組
df3=pd.DataFrame(np.random.randint(,,(,)),index=[,,,],columns=['A','B','C','D'])
df3
代碼結果:
A | B | C | D | |
---|---|---|---|---|
1 | 9 | 8 | 4 | 6 |
2 | 5 | 7 | 7 | 4 |
3 | 6 | 3 | 2 | |
4 | 4 | 6 | 9 | 8 |
2 字典
行索引由index決定,列索引由字典的鍵決定
dict1
代碼結果:
{'Province': ['Guangdong', 'Beijing', 'Qinghai', 'Fujiang'],
'pop': [1.3, 2.5, 1.1, 0.7],
'year': [2018, 2018, 2018, 2018]}
df4=pd.DataFrame(dict1,index=[,,,])
df4
代碼結果:
Province | pop | year | |
---|---|---|---|
1 | Guangdong | 1.3 | 2018 |
2 | Beijing | 2.5 | 2018 |
3 | Qinghai | 1.1 | 2018 |
4 | Fujiang | 0.7 | 2018 |
3 結構數組
其中列索引由結構數組的字段名決定
arr=np.array([('item1',),('item2',),('item3',),('item4',)],dtype=[("name","10S"),("count",int)])
df5=pd.DataFrame(arr)
df5
代碼結果:
name | count | |
---|---|---|
b’item1’ | 10 | |
1 | b’item2’ | 20 |
2 | b’item3’ | 30 |
3 | b’item4’ | 40 |
- 此外可以調用from_開頭的類方法,将特定的資料轉換為DataFrame對象。例如from_dict(),其orient參數指定字典鍵對應的方向,預設為”columns”:
dict2={"a":[,,],"b":[,,]}
df6=pd.DataFrame.from_dict(dict2)
df6
代碼結果:
a | b | |
---|---|---|
1 | 4 | |
1 | 2 | 5 |
2 | 3 | 6 |
df7=pd.DataFrame.from_dict(dict2,orient="index")
df7
代碼結果:
1 | 2 | ||
---|---|---|---|
a | 1 | 2 | 3 |
b | 4 | 5 | 6 |
四、将DataFrame對象轉換為其他格式的資料
- to_dict()方法将DataFrame對象轉換為字典,參數orient決定字典元素的類型:
df7.to_dict()
代碼結果:
{0: {'a': 1, 'b': 4}, 1: {'a': 2, 'b': 5}, 2: {'a': 3, 'b': 6}}
df7.to_dict(orient="records")
代碼結果:
[{0: 1, 1: 2, 2: 3}, {0: 4, 1: 5, 2: 6}]
df7.to_dict(orient="list")
代碼結果:
{0: [1, 4], 1: [2, 5], 2: [3, 6]}
- 類似的還有to_records()、to_csv()等
謝謝大家的浏覽,
希望我的努力能幫助到您,
共勉!