學習筆記,這個筆記以例子為主。
開發工具:Spyder
文章目錄
- pandas介紹
- Series
- 建立Series
- 通路Series中的資料
- pandas日期處理
- DateTimeIndex
pandas介紹
pandas是基于NumPy 的一種工具,該工具是為了解決資料分析任務而建立的。Pandas 納入 了大量庫和一些标準的資料模型,提供了高效地操作大型結構化資料集所需的工具。
Series
Series可以了解為一個一維的數組,隻是index名稱可以自己改動。類似于定長的有序字典,有Index和 value。
建立Series
- 文法
import pandas as pd
# 建立一個空的系列
s = pd.Series()
# 從ndarray建立一個系列
data = np.array(['a','b','c','d'])
s = pd.Series(data)
s = pd.Series(data,index=[100,101,102,103])
# 從字典建立一個系列
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)
# 從标量建立一個系列
s = pd.Series(5, index=[0, 1, 2, 3])
- 例子
代碼1(從ndarray建立一個系列):
import numpy as np
import pandas as pd
data = np.array(['Ada', 'Bunny', 'Jack', 'Black'])
s1 = pd.Series(data)
print(s1)
結果1:
0 Ada
1 Bunny
2 Jack
3 Black
dtype: object
代碼2(自定義index):
s2 = pd.Series(data, index = [10, 20, 30, 40])
print(s2)
結果2:
10 Ada
20 Bunny
30 Jack
40 Black
dtype: object
代碼3(從字典建立一個系列):
data = {"a":0, "b":1, "c":2, 'e':3}
#字典的key為Series的index
s3 = pd.Series(data)
print(s3)
結果3:
a 0
b 1
c 2
e 3
dtype: int64
代碼4(從标量建立一個系列):
s4 = pd.Series(10, index = [0, 1, 2, 3])
print(s4)
結果4:
0 10
1 10
2 10
3 10
dtype: int64
通路Series中的資料
- 文法
# 使用索引檢索元素
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
print(s[0], s[:3], s[-3:])
# 使用标簽檢索資料
print(s['a'], s[['a','c','d']])
- 例子
代碼1:
import numpy as np
import pandas as pd
data = np.array(['Ada', 'Bunny', 'Jack', 'Black'])
s = pd.Series(data, index = ["a", "b", "c", "d"])
print(s[0], '\n\n',s[:3],'\n\n', s[-3: ])
結果1:
Ada
a Ada
b Bunny
c Jack
dtype: object
b Bunny
c Jack
d Black
dtype: object
代碼2:
print(s["a"], '\n\n',s[["a", "b", "c"]])
結果2:
Ada
a Ada
b Bunny
c Jack
dtype: object
pandas日期處理
- 文法
# pandas可以識别的日期字元串格式
dates = pd.Series(['2011', '2011-02', '2011-03-01', '2011/04/01', '2011/05/01 01:01:01', '01 Jun 2011'])
# to_datetime()方法可以轉換為日期資料類型
dates = pd.to_datetime(dates)
- 例子
代碼1(識别日期):
import numpy as np
import pandas as pd
dates = pd.Series(['1997', '2015-09', '2019-03-01',
'2019/04/01', '2019/05/01 01:01:01',
'01 Jun 2019'])
print(dates)
print("-"*20)
dates = pd.to_datetime(dates)
print(dates)
結果1:
0 1997
1 2015-09
2 2019-03-01
3 2019/04/01
4 2019/05/01 01:01:01
5 01 Jun 2019
dtype: object
--------------------
0 1997-01-01 00:00:00
1 2015-09-01 00:00:00
2 2019-03-01 00:00:00
3 2019-04-01 00:00:00
4 2019-05-01 01:01:01
5 2019-06-01 00:00:00
dtype: datetime64[ns]
代碼2(日期運算):
delta = dates - pd.to_datetime('1970-01-01')
print(delta)
print("-"*20)
#通過Series的dt接口,可以通路偏移量資料
print(delta.dt.days)
結果2:
0 9862 days 00:00:00
1 16679 days 00:00:00
2 17956 days 00:00:00
3 17987 days 00:00:00
4 18017 days 01:01:01
5 18048 days 00:00:00
dtype: timedelta64[ns]
--------------------
0 9862
1 16679
2 17956
3 17987
4 18017
5 18048
dtype: int64
Series.dt提供了很多日期相關操作, 部分操作如下:
Series.dt的日期相關操作 | 含義 |
Series.dt.year | The year of the datetime. |
Series.dt.month | The month as January=1, December=12. |
Series.dt.day | The days of the datetime. |
Series.dt.hour | The hours of the datetime. |
Series.dt.minute | The minutes of the datetime. |
Series.dt.second | The seconds of the datetime. |
Series.dt.microsecond | The microseconds of the datetime. |
Series.dt.week | The week ordinal of the year. |
Series.dt.weekofyear | The week ordinal of the year. |
Series.dt.dayofweek | The day of the week with Monday=0, Sunday=6. |
Series.dt.weekday | The day of the week with Monday=0, Sunday=6. |
Series.dt.dayofyear | The ordinal day of the year. |
Series.dt.quarter | The quarter of the date. |
Series.dt.is_month_start | Indicates whether the date is the first day of the month. |
Series.dt.is_month_end | Indicates whether the date is the last day of the month. |
Series.dt.is_quarter_start | Indicator for whether the date is the first day of a quarter. |
Series.dt.is_quarter_end | Indicator for whether the date is the last day of a quarter. |
Series.dt.is_year_start | Indicate whether the date is the first day of a year. |
Series.dt.is_year_end Indicate | whether the date is the last day of the year. |
Series.dt.is_leap_year | Boolean indicator if the date belongs to a leap year. |
Series.dt.days_in_month | The number of days in the month. |
代碼3(dt接口的各項操作示範):
print(dates.dt.month)
結果3:
0 1
1 9
2 3
3 4
4 5
5 6
dtype: int64
DateTimeIndex
通過指定周期和頻率,使用pd.date_range()函數就可以建立日期序列。
- 文法
import pandas as pd
# 以日為頻率(預設值), 2019/08/21為起始,建立5個時間資料
datelist = pd.date_range('2019/08/21', periods = 5)
# 以月為頻率
datelist = pd.date_range('2019/08/21', periods=5,freq='M')
# 建構某個區間的時間序列
start = pd.datetime(2017, 11, 1)
end = pd.datetime(2017, 11, 5)
dates = pd.date_range(start, end)
- 例子
代碼1:
import numpy as np
import pandas as pd
dates1 = pd.date_range('2020-01-01', periods = 5,
freq = 'D')
print(dates1)
print("-"*20)
dates2 = pd.date_range('2015-01-10', periods = 5,
freq = 'M')
print(dates2)
print("-"*20)
start_num = pd.datetime(2019, 1, 1)
end_num = pd.datetime(2019, 1, 5)
dates3 = pd.date_range(start_num, end_num)
print(dates3)
結果1:
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
'2020-01-05'],
dtype='datetime64[ns]', freq='D')
--------------------
DatetimeIndex(['2015-01-31', '2015-02-28', '2015-03-31', '2015-04-30',
'2015-05-31'],
dtype='datetime64[ns]', freq='M')
--------------------
DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04',
'2019-01-05'],
dtype='datetime64[ns]', freq='D')
代碼2:
dates1 = pd.bdate_range('2020-01-01', periods = 10)
print(dates1)
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-06',
'2020-01-07', '2020-01-08', '2020-01-09', '2020-01-10',
'2020-01-13', '2020-01-14'],
dtype='datetime64[ns]', freq='B')