【Python】資料分析.pandas.索引、重建索引、複合索引

文章目錄

資料分析.pandas.索引及複合索引

一、索引（index）
二、重建索引（reindex）
三、複合索引
四、Series、DataFrame複合索引的操作

資料分析.pandas.索引及複合索引

一、索引（index）

pandas的索引對象負責管理軸标簽和其他中繼資料（比如軸名稱等）。建構Series或DataFrame時，被作為索引的數組會被轉換成一個Index：

import numpy as np
import pandas as pd

df = pd.DataFrame(np.arange(12).reshape((3,4)),index=(list("abc")),columns=(list("ABCD")))
print(df)
#    A  B   C   D
# a  0  1   2   3
# b  4  5   6   7
# c  8  9  10  11

print(df.index)
# Index(['a', 'b', 'c'], dtype='object')

除了類似于數組，Index也具有set集合的功能：

In [85]: frame3
Out[85]: 
state  Nevada  Ohio
year
2000      NaN   1.5
2001      2.4   1.7
2002      2.9   3.6

In [86]: frame3.columns
Out[86]: Index(['Nevada', 'Ohio'], dtype='object', name='state')

In [87]: 'Ohio' in frame3.columns
Out[87]: True

In [88]: 2003 in frame3.index
Out[88]: False

與python的集合不同，pandas的Index可以包含重複的标簽，選擇重複的标簽，會顯示所有的結果。

In [89]: dup_labels = pd.Index(['foo', 'foo', 'bar', 'bar'])
In [90]: dup_labels
Out[90]: Index(['foo', 'foo', 'bar', 'bar'], dtype='object')

下面是Index對象主要的方法和屬性：

傳回頂部

二、重建索引（reindex）

pandas對象的reindex方法作用是重建新的索引。

method選項可填入的參數包括 : {None, ‘backfill’/‘bfill’, ‘pad’/‘ffill’, ‘nearest’}

預設為None: 不進行填充

pad/ffill: 前向填充，使用前面的有效值進行填充

backfill/bfill: 後向填充，使用後面有效值進行填充

nearest: 使用最近的有效值進行填充

In [95]: obj3 = pd.Series(['blue', 'purple', 'yellow'], index=[0, 2, 4])
In [96]: obj3
Out[96]: 
0      blue
2    purple
4    yellow
dtype: object

In [97]: obj3.reindex(range(6), method='ffill')
Out[97]: 
0      blue
1      blue
2    purple
3    purple
4    yellow
5    yellow
dtype: object

列可以用columns關鍵字重新索引：

df = pd.DataFrame(np.arange(12).reshape((3,4)),
index=(list("abc")),columns=(list("ABCD")))
print(df)
#    A  B   C   D
# a  0  1   2   3
# b  4  5   6   7
# c  8  9  10  11

print(df.index)
# Index(['a', 'b', 'c'], dtype='object')

借助DataFrame，reindex可以修改（行）索引和列。隻傳遞一個序列時，會重新索引結果的行：

df = pd.DataFrame(np.arange(12).reshape((3,4)),
index=(list("abc")),columns=(list("ABCD")))
print(df)
#    A  B   C   D
# a  0  1   2   3
# b  4  5   6   7
# c  8  9  10  11

print(df.index)
# Index(['a', 'b', 'c'], dtype='object')

df3 = df.reindex(list("abcd"))
print(df3)
#      A    B     C     D
# a  0.0  1.0   2.0   3.0
# b  4.0  5.0   6.0   7.0
# c  8.0  9.0  10.0  11.0
# d  NaN  NaN   NaN   NaN

傳回頂部

三、複合索引

索引可以包含一個、兩個或多個列。兩個或更多個列上的索引被稱作複合索引。

pandas中set_index方法是專門用來将某一列設定為index的方法。它具有簡單，友善，快捷的特點。

主要參數：

keys：

需要設定為index的列名

drop：

True or False。在将原來的列設定為index，是否需要删除原來的列。預設為True，即删除（Delete columns to be used as the new index.）

append：

True or False。新的index設定之後，是否要删除原來的index。預設為True。（Whether to append columns to existing index.）

inplace：

True or False。是否要用新的DataFrame取代原來的DataFrame。預設False，即不取代。**（ Modify the DataFrame in place (do not create a new object)）

import numpy as np
import pandas as pd

df = pd.DataFrame(np.arange(12).reshape((3,4)),
index=(list("abc")),columns=(list("ABCD")))
print(df)
#    A  B   C   D
# a  0  1   2   3
# b  4  5   6   7
# c  8  9  10  11

print(df.index)
# Index(['a', 'b', 'c'], dtype='object')

#指定索引
df.index = ["a","b","m"]
print(df)
#    A  B   C   D
# a  0  1   2   3
# b  4  5   6   7
# m  8  9  10  11

#重新設定索引
print(df.reindex(list("abd")))
#      A    B    C    D
# a  0.0  1.0  2.0  3.0
# b  4.0  5.0  6.0  7.0
# d  NaN  NaN  NaN  NaN

#指定某一列為索引
print(df.set_index("A"))
#    B   C   D
# A
# 0  1   2   3
# 4  5   6   7
# 8  9  10  11
print(df.set_index("A",drop=False))
#    A  B   C   D
# A
# 0  0  1   2   3
# 4  4  5   6   7

#設定兩個索引
df1 = df.set_index(["A","B"],drop=False)
print(df1)
#      A  B   C   D
# A B
# 0 1  0  1   2   3
# 4 5  4  5   6   7
# 8 9  8  9  10  11
print(df1.index)
# MultiIndex([(0, 1),
#             (4, 5),
#             (8, 9)],
#            names=['A', 'B'])

#設定三個索引
df2 = df.set_index(["A","B","C"],drop=False)
print(df2)
#         A  B   C   D
# A B C
# 0 1 2   0  1   2   3
# 4 5 6   4  5   6   7
# 8 9 10  8  9  10  11
print(df2.index)
# MultiIndex([(0, 1,  2),
#             (4, 5,  6),
#             (8, 9, 10)],
#            names=['A', 'B', 'C'])

傳回頂部

四、Series、DataFrame複合索引的操作

a = pd.DataFrame({'a':range(7),'b':range(7,0,-1),'c':['one','one','one','two','two','two','two'],'d':list("hjklmno")})
print(a)
#    a  b    c  d
# 0  0  7  one  h
# 1  1  6  one  j
# 2  2  5  one  k
# 3  3  4  two  l
# 4  4  3  two  m
# 5  5  2  two  n
# 6  6  1  two  o
b = a.set_index(["c","d"]) #複合索引
print(b)
#        a  b
# c   d
# one h  0  7
#     j  1  6
#     k  2  5
# two l  3  4
#     m  4  3
#     n  5  2
#     o  6  1

print(b.loc["one"].loc["h"]) #切片取值
# a    0
# b    7

#對于複合索引Series的切片操作
print(c["one"]["j"]) #1
print(c["one"])
# h    0
# j    1
# k    2
# Name: a, dtype: int64

#我們将c、d兩列的順序調換
d = a.set_index(["d","c"])["a"]
print(d)
#        a
# d c
# h one  0
# j one  1
# k one  2
# l two  3
# m two  4
# n two  5
# o two  6

#此時若要取内層索引為one的資料
#法一：以c["..."]的形式，将内層索引為one的一個一個取出
#法二：使用swaplevel()方法交換
e = d.swaplevel()
print(e)
# c   d
# one h  0
#     j  1
#     k  2
# two l  3
#     m  4
#     n  5
#     o  6
print(e["one"])
# d
# h    0
# j    1
# k    2
# Name: a, dtype: int64
print(e["one","h"])
# 0

【Python】資料分析.pandas.索引、重建索引、複合索引

文章目錄

資料分析.pandas.索引及複合索引

一、索引（index）

二、重建索引（reindex）

三、複合索引

四、Series、DataFrame複合索引的操作

繼續閱讀

來自python的【條件控制/語句循環/break/continue/else/pass】一、條件控制二、語句循環

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入