關于pandas時間資料的內建處理

工作中遇到的一個問題: 統計各地區新能源汽車的充電時長資料來源是北理新源的單日全球的運作資料。

這裡僅統計北上廣重慶四個地區的資料處理的代碼就省略了需要整理好的是4個dataframe 分别是對應上述4個城市的:

import pandas as pd
from pyecharts import Boxplot,Pie,Page
theme_echart='infographic'

location_list=['shanghai','chongqing','guangdong','beijing']
ans_vid={}

for i in location_list:
    ans_vid[i]=pd.read_hdf(i+'_charging.h5',encoding='gbk')

location_list_chinese=['上海','重慶','廣東','北京'];
for i in range(len(location_list_chinese)):
    ans_vid[location_list_chinese[i]] = ans_vid.pop(location_list[i])

例:

這時候我們需要提取其中的時間序列統計所有vid的充電狀态為1的第一個時間和最後一個時間即為該車的充電時長

代碼如下:

page=Page()   
for i in location_list_chinese:
    ans_vid[i]=ans_vid[i][ans_vid[i]['充電狀态']=='1.0']
    temp1=ans_vid[i].drop_duplicates(['vid'],keep='last')
    temp2=ans_vid[i].drop_duplicates(['vid'],keep='first')
    a=temp2['上報時間']
    b=temp1['上報時間']
    a=a.reset_index()
    b=b.reset_index()
    a=a.drop(['index'],axis=1)
    b=b.drop('index',axis=1)
    a['上報時間']=a['上報時間'].astype(str)
    a['上報時間']=a['上報時間'].apply(lambda v: v[0:4]+'-'+v[4:6]+'-'+v[6:8]+' '+v[8:10]+':'+v[10:12]+':'+v[12:14])
    b['上報時間']=b['上報時間'].astype(str)
    b['上報時間']=b['上報時間'].apply(lambda v: v[0:4]+'-'+v[4:6]+'-'+v[6:8]+' '+v[8:10]+':'+v[10:12]+':'+v[12:14])
    b['上報時間']=pd.to_datetime(b['上報時間'])
    a['上報時間']=pd.to_datetime(a['上報時間'])
    temp=b['上報時間']-a['上報時間']
    temp=pd.DataFrame(temp)
    temp['上報時間']=temp['上報時間'].dt.total_seconds()/3600
    temp['充電時長']=temp['上報時間'].astype(str)
    temp['充電時長'][temp['上報時間']<=1]='<1h'
    temp['充電時長'][(temp['上報時間']>1) & (temp['上報時間']<=4)]='1-4h'
    temp['充電時長'][(temp['上報時間']>4) & (temp['上報時間']<=8)]='4-8h'
    temp['充電時長'][temp['上報時間']>8]='>8h'
    local_charging_time=temp['充電時長'].value_counts()
    box=Boxplot(i+'地區充電時長統計')
    pie=Pie(i+'地區充電時長統計')
    box.use_theme(theme_echart)
    pie.use_theme(theme_echart)
#    kwargs = dict(name = i,
#    x_axis = list(local_charging_time.index),
#    y_axis = list(local_charging_time.values),
#    is_legend_show=False,
#    is_label_show=True
#    )
#    bar.add(**kwargs)
    x=list(local_charging_time.index);
    y=list(local_charging_time.values);
    pie.add("",x,y,radius=(40,75),
               is_label_show=True,legend_orient = 'vertical',
               legend_pos = 'left',legend_top='center')
    # box畫圖
    y_axis =[]
    for j in x:      
        y_axis.append(list(temp['上報時間'][temp['充電時長']==j]))
    y=box.prepare_data(y_axis)   
    box.add(i+'地區各充電時長分布', x, y,xaxis_name='',
      yaxis_name='充電時長[h]',is_legend_show=True,legend_pos='right',is_label_show=True,yaxis_name_gap=45,xaxis_type='category',xaxis_rotate=0)
    page.add(pie)
    page.add(box)
    del box,pie

page.render('北上廣重地區充電時長統計_v2.html')

　　可以看到核心處理程式是pd.to_datetime(a['上報時間'])

　　temp['上報時間']=temp['上報時間'].dt.total_seconds()/3600 # 此處提取時間差格式的秒數, 再折算成小時

結果如下圖:

一個相似的例子是需要統計這四個地區的充電開始時段的分布(根據電網電價的需求而來)

核心是将連續的時間格式字元Series內建轉化成時間格式，即'20190101235502'轉化成 2019-01-01 23:55:02

然後調用pd.to_datetime