天天看點

pandas基礎(part1)--Series

學習筆記,這個筆記以例子為主。

開發工具:Spyder

文章目錄

  • ​​pandas介紹​​
  • ​​Series​​
  • ​​建立Series​​
  • ​​通路Series中的資料​​
  • ​​pandas日期處理​​
  • ​​DateTimeIndex​​

pandas介紹

pandas是基于NumPy 的一種工具,該工具是為了解決資料分析任務而建立的。Pandas 納入 了大量庫和一些标準的資料模型,提供了高效地操作大型結構化資料集所需的工具。

Series

Series可以了解為一個一維的數組,隻是index名稱可以自己改動。類似于定長的有序字典,有Index和 value。

建立Series

  • 文法
import pandas as pd

# 建立一個空的系列
s = pd.Series()
# 從ndarray建立一個系列
data = np.array(['a','b','c','d'])
s = pd.Series(data)
s = pd.Series(data,index=[100,101,102,103])
# 從字典建立一個系列 
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)
# 從标量建立一個系列
s = pd.Series(5, index=[0, 1, 2, 3])      
  • 例子

代碼1(從ndarray建立一個系列):

import numpy as np
import pandas as pd

data = np.array(['Ada', 'Bunny', 'Jack', 'Black'])

s1 = pd.Series(data)
print(s1)      

結果1:

0      Ada
1    Bunny
2     Jack
3    Black
dtype: object      

代碼2(自定義index):

s2 = pd.Series(data, index = [10, 20, 30, 40])
print(s2)      

結果2:

10      Ada
20    Bunny
30     Jack
40    Black
dtype: object      

代碼3(從字典建立一個系列):

data = {"a":0, "b":1, "c":2, 'e':3}
#字典的key為Series的index
s3 = pd.Series(data)
print(s3)      

結果3:

a    0
b    1
c    2
e    3
dtype: int64      

代碼4(從标量建立一個系列):

s4 = pd.Series(10, index = [0, 1, 2, 3])
print(s4)      

結果4:

0    10
1    10
2    10
3    10
dtype: int64      

通路Series中的資料

  • 文法
# 使用索引檢索元素
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
print(s[0], s[:3], s[-3:])
# 使用标簽檢索資料
print(s['a'], s[['a','c','d']])      
  • 例子

代碼1:

import numpy as np
import pandas as pd

data = np.array(['Ada', 'Bunny', 'Jack', 'Black'])

s = pd.Series(data, index = ["a", "b", "c", "d"])
print(s[0], '\n\n',s[:3],'\n\n', s[-3: ])      

結果1:

Ada 

 a      Ada
b    Bunny
c     Jack
dtype: object 

 b    Bunny
c     Jack
d    Black
dtype: object      

代碼2:

print(s["a"], '\n\n',s[["a", "b", "c"]])      

結果2:

Ada 

 a      Ada
b    Bunny
c     Jack
dtype: object      

pandas日期處理

  • 文法
# pandas可以識别的日期字元串格式
dates = pd.Series(['2011', '2011-02', '2011-03-01', '2011/04/01', '2011/05/01 01:01:01', '01 Jun 2011'])
# to_datetime()方法可以轉換為日期資料類型
dates = pd.to_datetime(dates)      
  • 例子

代碼1(識别日期):

import numpy as np
import pandas as pd

dates = pd.Series(['1997', '2015-09', '2019-03-01',
                   '2019/04/01', '2019/05/01 01:01:01',
                   '01 Jun 2019'])

print(dates)
print("-"*20)
dates = pd.to_datetime(dates)
print(dates)      

結果1:

0                   1997
1                2015-09
2             2019-03-01
3             2019/04/01
4    2019/05/01 01:01:01
5            01 Jun 2019
dtype: object
--------------------
0   1997-01-01 00:00:00
1   2015-09-01 00:00:00
2   2019-03-01 00:00:00
3   2019-04-01 00:00:00
4   2019-05-01 01:01:01
5   2019-06-01 00:00:00
dtype: datetime64[ns]      

代碼2(日期運算):

delta = dates - pd.to_datetime('1970-01-01')
print(delta)
print("-"*20)
#通過Series的dt接口,可以通路偏移量資料
print(delta.dt.days)      

結果2:

0    9862 days 00:00:00
1   16679 days 00:00:00
2   17956 days 00:00:00
3   17987 days 00:00:00
4   18017 days 01:01:01
5   18048 days 00:00:00
dtype: timedelta64[ns]
--------------------
0     9862
1    16679
2    17956
3    17987
4    18017
5    18048
dtype: int64      

Series.dt提供了很多日期相關操作, 部分操作如下:

Series.dt的日期相關操作 含義
Series.dt.year The year of the datetime.
Series.dt.month The month as January=1, December=12.
Series.dt.day The days of the datetime.
Series.dt.hour The hours of the datetime.
Series.dt.minute The minutes of the datetime.
Series.dt.second The seconds of the datetime.
Series.dt.microsecond The microseconds of the datetime.
Series.dt.week The week ordinal of the year.
Series.dt.weekofyear The week ordinal of the year.
Series.dt.dayofweek The day of the week with Monday=0, Sunday=6.
Series.dt.weekday The day of the week with Monday=0, Sunday=6.
Series.dt.dayofyear The ordinal day of the year.
Series.dt.quarter The quarter of the date.
Series.dt.is_month_start Indicates whether the date is the first day of the month.
Series.dt.is_month_end Indicates whether the date is the last day of the month.
Series.dt.is_quarter_start Indicator for whether the date is the first day of a quarter.
Series.dt.is_quarter_end Indicator for whether the date is the last day of a quarter.
Series.dt.is_year_start Indicate whether the date is the first day of a year.
Series.dt.is_year_end Indicate whether the date is the last day of the year.
Series.dt.is_leap_year Boolean indicator if the date belongs to a leap year.
Series.dt.days_in_month The number of days in the month.

代碼3(dt接口的各項操作示範):

print(dates.dt.month)      

結果3:

0    1
1    9
2    3
3    4
4    5
5    6
dtype: int64      

DateTimeIndex

通過指定周期和頻率,使用pd.date_range()函數就可以建立日期序列。

  • 文法
import pandas as pd
# 以日為頻率(預設值), 2019/08/21為起始,建立5個時間資料
datelist = pd.date_range('2019/08/21', periods = 5)

# 以月為頻率
datelist = pd.date_range('2019/08/21', periods=5,freq='M')

# 建構某個區間的時間序列
start = pd.datetime(2017, 11, 1)
end = pd.datetime(2017, 11, 5)
dates = pd.date_range(start, end)      
  • 例子

代碼1:

import numpy as np
import pandas as pd


dates1 = pd.date_range('2020-01-01', periods = 5,
                       freq = 'D')
print(dates1)

print("-"*20)

dates2 = pd.date_range('2015-01-10', periods = 5,
                       freq = 'M')
print(dates2)

print("-"*20)

start_num = pd.datetime(2019, 1, 1)
end_num = pd.datetime(2019, 1, 5)
dates3 = pd.date_range(start_num, end_num)
print(dates3)      

結果1:

DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
               '2020-01-05'],
              dtype='datetime64[ns]', freq='D')
--------------------
DatetimeIndex(['2015-01-31', '2015-02-28', '2015-03-31', '2015-04-30',
               '2015-05-31'],
              dtype='datetime64[ns]', freq='M')
--------------------
DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04',
               '2019-01-05'],
              dtype='datetime64[ns]', freq='D')      

代碼2:

dates1 = pd.bdate_range('2020-01-01', periods = 10)
print(dates1)      
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-06',
               '2020-01-07', '2020-01-08', '2020-01-09', '2020-01-10',
               '2020-01-13', '2020-01-14'],
              dtype='datetime64[ns]', freq='B')