利用python中的matplotlib列印混淆矩陣執行個體

前面說過混淆矩陣是我們在處理分類問題時，很重要的名額，那麼如何更好的把混淆矩陣給列印出來呢，直接做表或者是前端可視化，小編曾經就嘗試過用前端（D5）做出來，然後截圖，顯得不那麼好看。。

代碼：

import itertools
import matplotlib.pyplot as plt
import numpy as np
 
def plot_confusion_matrix(cm, classes,
       normalize=False,
       title='Confusion matrix',
       cmap=plt.cm.Blues):
 """
 This function prints and plots the confusion matrix.
 Normalization can be applied by setting `normalize=True`.
 """
 if normalize:
  cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
  print("Normalized confusion matrix")
 else:
  print('Confusion matrix, without normalization')
 
 print(cm)
 
 plt.imshow(cm, interpolation='nearest', cmap=cmap)
 plt.title(title)
 plt.colorbar()
 tick_marks = np.arange(len(classes))
 plt.xticks(tick_marks, classes, rotation=45)
 plt.yticks(tick_marks, classes)
 
 fmt = '.2f' if normalize else 'd'
 thresh = cm.max() / 2.
 for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
  plt.text(j, i, format(cm[i, j], fmt),
     horizontalalignment="center",
     color="white" if cm[i, j]   thresh else "black")
 
 plt.tight_layout()
 plt.ylabel('True label')
 plt.xlabel('Predicted label')
 plt.show()
 # plt.savefig('confusion_matrix',dpi=200)
 
cnf_matrix = np.array([
 [4101, 2, 5, 24, 0],
 [50, 3930, 6, 14, 5],
 [29, 3, 3973, 4, 0],
 [45, 7, 1, 3878, 119],
 [31, 1, 8, 28, 3936],
])
 
class_names = ['Buildings', 'Farmland', 'Greenbelt', 'Wasteland', 'Water']
 
# plt.figure()
# plot_confusion_matrix(cnf_matrix, classes=class_names,
#      title='Confusion matrix, without normalization')
 
# Plot normalized confusion matrix
plt.figure()
plot_confusion_matrix(cnf_matrix, classes=class_names, normalize=True,
      title='Normalized confusion matrix')

複制

在放矩陣位置，放一下你的混淆矩陣就可以，當然可視化混淆矩陣這一步也可以直接在模型運作中完成。

補充知識：混淆矩陣(Confusion matrix)的原理及使用(scikit-learn 和 tensorflow)

原理

在機器學習中, 混淆矩陣是一個誤差矩陣, 常用來可視化地評估監督學習算法的性能. 混淆矩陣大小為 (n_classes, n_classes) 的方陣, 其中 n_classes 表示類的數量. 這個矩陣的每一行表示真實類中的執行個體, 而每一清單示預測類中的執行個體 (Tensorflow 和 scikit-learn 采用的實作方式). 也可以是, 每一行表示預測類中的執行個體, 而每一清單示真實類中的執行個體 (Confusion matrix From Wikipedia 中的定義). 通過混淆矩陣, 可以很容易看出系統是否會弄混兩個類, 這也是混淆矩陣名字的由來.

混淆矩陣是一種特殊類型的列聯表(contingency table)或交叉制表(cross tabulation or crosstab). 其有兩維 (真實值 “actual” 和預測值 “predicted” ), 這兩維都具有相同的類(“classes”)的集合. 在列聯表中, 每個次元和類的組合是一個變量. 列聯表以表的形式, 可視化地表示多個變量的頻率分布.

使用混淆矩陣( scikit-learn 和 Tensorflow)

下面先介紹在 scikit-learn 和 tensorflow 中計算混淆矩陣的 API (Application Programming Interface) 接口函數, 然後在一個示例中, 使用這兩個 API 函數.

scikit-learn 混淆矩陣函數 sklearn.metrics.confusion_matrix API 接口

skearn.metrics.confusion_matrix(
 y_true, # array, Gound true (correct) target values
 y_pred, # array, Estimated targets as returned by a classifier
 labels=None, # array, List of labels to index the matrix.
 sample_weight=None # array-like of shape = [n_samples], Optional sample weights
)

複制

在 scikit-learn 中, 計算混淆矩陣用來評估分類的準确度.

按照定義, 混淆矩陣 C 中的元素 Ci,j 等于真實值為組 i , 而預測為組 j 的觀測數(the number of observations). 是以對于二分類任務, 預測結果中, 正确的負例數(true negatives, TN)為 C0,0; 錯誤的負例數(false negatives, FN)為 C1,0; 真實的正例數為 C1,1; 錯誤的正例數為 C0,1.

如果 labels 為 None, scikit-learn 會把在出現在 y_true 或 y_pred 中的所有值添加到标記清單 labels 中, 并排好序.

Tensorflow 混淆矩陣函數 tf.confusion_matrix API 接口

tf.confusion_matrix(
 labels, # 1-D Tensor of real labels for the classification task
 predictions, # 1-D Tensor of predictions for a givenclassification
 num_classes=None, # The possible number of labels the classification task can have
 dtype=tf.int32, # Data type of the confusion matrix 
 name=None, # Scope name
 weights=None, # An optional Tensor whose shape matches predictions
)

複制

Tensorflow tf.confusion_matrix 中的 num_classes 參數的含義, 與 scikit-learn sklearn.metrics.confusion_matrix 中的 labels 參數相近, 是與标記有關的參數, 表示類的總個數, 但沒有列出具體的标記值. 在 Tensorflow 中一般是以整數作為标記, 如果标記為字元串等非整數類型, 則需先轉為整數表示. 如果 num_classes 參數為 None, 則把 labels 和 predictions 中的最大值 + 1, 作為num_classes 參數值.

tf.confusion_matrix 的 weights 參數和 sklearn.metrics.confusion_matrix 的 sample_weight 參數的含義相同, 都是對預測值進行權重, 在此基礎上, 計算混淆矩陣單元的值.

使用示例

#!/usr/bin/env python
# -*- coding: utf8 -*-
"""
Author: klchang
Description: 
　　A simple example for tf.confusion_matrix and sklearn.metrics.confusion_matrix.
Date: 2018.9.8
"""
from __future__ import print_function
import tensorflow as tf
import sklearn.metrics
 
y_true = [1, 2, 4]
y_pred = [2, 2, 4]
 
# Build graph with tf.confusion_matrix operation
sess = tf.InteractiveSession()
op = tf.confusion_matrix(y_true, y_pred)
op2 = tf.confusion_matrix(y_true, y_pred, num_classes=6, dtype=tf.float32, weights=tf.constant([0.3, 0.4, 0.3]))
# Execute the graph
print ("confusion matrix in tensorflow: ")
print ("1. default: \n", op.eval())
print ("2. customed: \n", sess.run(op2))
sess.close()
 
# Use sklearn.metrics.confusion_matrix function
print ("\nconfusion matrix in scikit-learn: ")
print ("1. default: \n", sklearn.metrics.confusion_matrix(y_true, y_pred))
print ("2. customed: \n", sklearn.metrics.confusion_matrix(y_true, y_pred, labels=range(6), sample_weight=[0.3, 0.4, 0.3]))

複制

以上這篇利用python中的matplotlib列印混淆矩陣執行個體就是小編分享給大家的全部内容了，希望能給大家一個參考。