文章目錄

Pandas基本介紹
- Series
- DataFrame
Pandas選擇資料
- 簡單篩選
- 标簽 loc 選擇
- 序列 iloc 選擇
- 混合兩種 ix 選擇
- 通過判斷的篩選
Pandas設定值

Pandas基本介紹

Numpy是清單形式的，沒有數值标簽，而Pandas是字典形式。Pandas是基于Numpy建構的，讓Numpy為中心的應用變得更加簡單。
Pandas主要有兩個資料結構，Series和DataFrame。

Series

import pandas as pd
import numpy as np
s = pd.Series([1,3,6,np.nan,44,1])
print(s)
print(s[1])   #可以直接通路

Series

的字元串表現形式為：索引在左邊，值在右邊。由于沒有指定索引，預設建立0到N-1的整數型索引。下面是加上索引的

Series

grade = pd.Series([100,59,80],index=["李明","李紅","王美"])
print(grade.values)
print(grade.index)
print(grade["李明"])

DataFrame

dates = pd.date_range("20160101",periods=6)
df = pd.DataFrame(np.random.randn(6,4),index = dates,columns=['a','b','c','d'])  
print(df)

                   a         b         c         d
2016-01-01 -0.378199 -0.300236 -1.207843 -1.658223
2016-01-02 -1.031397 -0.834695 -0.417703 -0.318720
2016-01-03 -2.346667  1.615651  1.726296  1.152253
2016-01-04  1.389872  0.952453 -0.737092  1.555059
2016-01-05  0.735490  0.297005 -0.542341  0.559540
2016-01-06 -1.962791  1.776028 -1.917368 -0.679542

DataFrame

是一個表格型的資料結構，它包含有一組有序的列，每列可以是不同的值類型(數值、字元串、布爾值等)。

DataFrame

既有行索引也有列索引，它可以被看作由

Series

組成的大字典。

下面通路DataFrame中的資料，注意通路具體元素是先列标簽後行标簽

print(df['b'])

2016-01-01   -0.300236
2016-01-02   -0.834695
2016-01-03    1.615651
2016-01-04    0.952453
2016-01-05    0.297005
2016-01-06    1.776028
Freq: D, Name: b, dtype: float64

print(df['b']['2016-01-05'])

0.2970052798746942

建立一組沒有給定行标簽和列标簽的資料并通路

df = pd.DataFrame(np.arange(12).reshape((3,4)))
print(df)
print(df[1][0])
   0  1   2   3
0  0  1   2   3
1  4  5   6   7
2  8  9  10  11
1

自定義每列的類型

df1 = pd.DataFrame({
    'A':1,
    'B':pd.Timestamp('20180928'),
    'C':pd.Series(1,index=list(range(4)),dtype='float32'),
    'D':np.array([3]*4,dtype='int32'),
    'E':pd.Categorical(["test","train","test","train"]),
    'F':'foo'})
print(df1)
print(df1['B'])
print(df1['B'][1])

   A          B    C  D      E    F
0  1 2018-09-28  1.0  3   test  foo
1  1 2018-09-28  1.0  3  train  foo
2  1 2018-09-28  1.0  3   test  foo
3  1 2018-09-28  1.0  3  train  foo
0   2018-09-28
1   2018-09-28
2   2018-09-28
3   2018-09-28
Name: B, dtype: datetime64[ns]
2018-09-28 00:00:00

檢視每行的名稱

print(df1.index)

Int64Index([0, 1, 2, 3], dtype='int64')

檢視每列的名稱

print(df1.columns)
Index(['A', 'B', 'C', 'D', 'E', 'F'], dtype='object')

檢視所有的值

print(df1.values)

[[1 Timestamp('2018-09-28 00:00:00') 1.0 3 'test' 'foo']
 [1 Timestamp('2018-09-28 00:00:00') 1.0 3 'train' 'foo']
 [1 Timestamp('2018-09-28 00:00:00') 1.0 3 'test' 'foo']
 [1 Timestamp('2018-09-28 00:00:00') 1.0 3 'train' 'foo']]

檢視資料總結

print(df1.describe())

         A    C    D
count  4.0  4.0  4.0
mean   1.0  1.0  3.0
std    0.0  0.0  0.0
min    1.0  1.0  3.0
25%    1.0  1.0  3.0
50%    1.0  1.0  3.0
75%    1.0  1.0  3.0
max    1.0  1.0  3.0

對資料的

index

進行排序輸出

print(df1.sort_index(axis=1,ascending=False))
     F      E  D    C          B  A
0  foo   test  3  1.0 2018-09-28  1
1  foo  train  3  1.0 2018-09-28  1
2  foo   test  3  1.0 2018-09-28  1
3  foo  train  3  1.0 2018-09-28  1

對資料的

value

進行排序輸出

print(df1.sort_values(by='B'))
   A          B    C  D      E    F
0  1 2018-09-28  1.0  3   test  foo
1  1 2018-09-28  1.0  3  train  foo
2  1 2018-09-28  1.0  3   test  foo
3  1 2018-09-28  1.0  3  train  foo

Pandas選擇資料

簡單篩選

print(df)
            A   B   C   D
2013-01-01   0   1   2   3
2013-01-02   4   5   6   7
2013-01-03   8   9  10  11
2013-01-04  12  13  14  15
2013-01-05  16  17  18  19
2013-01-06  20  21  22  23

print(df['A'])
2013-01-01     0
2013-01-02     4
2013-01-03     8
2013-01-04    12
2013-01-05    16
2013-01-06    20
Freq: D, Name: A, dtype: int64

print(df.A)
2013-01-01     0
2013-01-02     4
2013-01-03     8
2013-01-04    12
2013-01-05    16
2013-01-06    20
Freq: D, Name: A, dtype: int64

print(df[0:3])
            A  B   C   D
2013-01-01  0  1   2   3
2013-01-02  4  5   6   7
2013-01-03  8  9  10  11

print(df['20130102':'20130104'])
             A   B   C   D
2013-01-02   4   5   6   7
2013-01-03   8   9  10  11
2013-01-04  12  13  14  15

标簽 loc 選擇

print(df.loc['20130102'])
A    4
B    5
C    6
D    7
Name: 2013-01-02 00:00:00, dtype: int64

print(df.loc[:,['A','B']])
             A   B
2013-01-01   0   1
2013-01-02   4   5
2013-01-03   8   9
2013-01-04  12  13
2013-01-05  16  17
2013-01-06  20  21

print(df.loc['20130102',['A','B']])
A    4
B    5
Name: 2013-01-02 00:00:00, dtype: int64

序列 iloc 選擇

print(df.iloc[3,1])
13

print(df.iloc[3:5,1:3])
             B   C
2013-01-04  13  14
2013-01-05  17  18

print(df.iloc[[1,3,5],1:3])
             B   C
2013-01-02   5   6
2013-01-04  13  14
2013-01-06  21  22

混合兩種 ix 選擇

print(df.ix[:3,['A','C']])   #混合選擇
            A   C
2013-01-01  0   2
2013-01-02  4   6
2013-01-03  8  10

通過判斷的篩選

print(df[df.A>8])
             A   B   C   D
2013-01-04  12  13  14  15
2013-01-05  16  17  18  19
2013-01-06  20  21  22  23

Pandas設定值

#Pandas設定值
dates = pd.date_range('20180901',periods=6)
df = pd.DataFrame(np.arange(24).reshape((6,4)),index=dates,columns=['A','B','C','D'])
print(df)
             A   B   C   D
2018-09-01   0   1   2   3
2018-09-02   4   5   6   7
2018-09-03   8   9  10  11
2018-09-04  12  13  14  15
2018-09-05  16  17  18  19
2018-09-06  20  21  22  23

#根據位置設定loc和iloc
df.loc['20180903','B'] = 100
df.iloc[5,3] = 200
print(df)
             A    B   C    D
2018-09-01   0    1   2    3
2018-09-02   4    5   6    7
2018-09-03   8  100  10   11
2018-09-04  12   13  14   15
2018-09-05  16   17  18   19
2018-09-06  20   21  22  200


#根據條件設定
df.B[df.A>9] = 0
print(df)
             A    B   C    D
2018-09-01   0    1   2    3
2018-09-02   4    5   6    7
2018-09-03   8  100  10   11
2018-09-04  12    0  14   15
2018-09-05  16    0  18   19
2018-09-06  20    0  22  200

#按行或列設定
df['F'] = 0
print(df)
             A    B   C    D  F
2018-09-01   0    1   2    3  0
2018-09-02   4    5   6    7  0
2018-09-03   8  100  10   11  0
2018-09-04  12    0  14   15  0
2018-09-05  16    0  18   19  0
2018-09-06  20    0  22  200  0

#添加資料
df['E'] = pd.Series([1,2,3,4,5,6],index = dates)
print(df)
             A    B   C    D  F  E
2018-09-01   0    1   2    3  0  1
2018-09-02   4    5   6    7  0  2
2018-09-03   8  100  10   11  0  3
2018-09-04  12    0  14   15  0  4
2018-09-05  16    0  18   19  0  5
2018-09-06  20    0  22  200  0  6

文章轉載于莫煩Python

Pandas簡單使用1Pandas基本介紹Pandas選擇資料Pandas設定值

文章目錄

Pandas基本介紹

Series

DataFrame

Pandas選擇資料

簡單篩選

标簽 loc 選擇

序列 iloc 選擇

混合兩種 ix 選擇

通過判斷的篩選

Pandas設定值

繼續閱讀

來自python的【條件控制/語句循環/break/continue/else/pass】一、條件控制二、語句循環

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入