天天看點

基于小波包和随機森林的uOttawa軸承資料集分類

本文采用小波包分解和随機森林分類器對uOttawa軸承資料集進行分類,比較簡單,直接看代碼就可以看懂,并可遷移至其他的一維資料集,比如心電信号,肌電信号,腦電信号,微振信号,各種聲信号等等,順便把python學一下,結合自己的領域學python能有效避免勸退。

資料集分為5類,分别為健康狀态,内圈故障,外圈故障,滾動體故障和複合故障

基于小波包和随機森林的uOttawa軸承資料集分類

其中複合故障資料檔案CompF内容如下

基于小波包和随機森林的uOttawa軸承資料集分類

健康工況資料檔案Healthy内容如下

基于小波包和随機森林的uOttawa軸承資料集分類

首先導入相關的信号處理子產品,若沒有小波子產品pywt,要首先pip install pywt

import glob
from scipy.io import loadmat
from numpy import asarray
import matplotlib.pyplot as plt
import numpy as np
from scipy import signal
import scipy
import re
import os
import pandas as pd
import pywt
from scipy.fftpack import fft
from warnings import warn
from sklearn import metrics

import warnings 
warnings.filterwarnings('ignore')
           

然後定義個FFT函數便于以後的特征提取(頻譜中較高幅值及其對應的頻率)

def apply_fft(x, fs, num_samples):
    f = np.linspace(0.0, (fs/2.0), num_samples//2)
    freq_values = fft(x)
    freq_values = 2.0/num_samples * np.abs(freq_values[0:num_samples//2])
    return f, freq_values
           

然後定義一個從原始振動信号建立資料集的函數,該函數處理.mat 振動資料檔案并對振動信号進行分割

def make_dataset(data_src, num_samples, class_):
    files = glob.glob(data_src)
    files = np.sort(files)
    data = loadmat(files[0])
    keysList = sorted(data.keys())
    key = keysList[0]
    drive_end_data = data[key]
    drive_end_data = drive_end_data.reshape(-1)
    num_segments = np.floor(len(drive_end_data)/num_samples)
    slices = np.split(drive_end_data[0:int(num_segments*num_samples)], num_samples)
    silces = np.array(slices).reshape(int(num_segments), num_samples)
    segmented_data = silces
    files = files[1:]
    for file in files:
        data = loadmat(file)
        keysList = sorted(data.keys())
        key = keysList[0]
        drive_end_data = data[key]
        drive_end_data = drive_end_data.reshape(-1)
        num_segments = np.floor(len(drive_end_data)/num_samples)
        slices = np.split(drive_end_data[0:int(num_segments*num_samples)], num_samples)
        silces = np.array(slices).reshape(int(num_segments), num_samples)
        segmented_data = np.concatenate( (segmented_data, silces) , axis=0, out=None)
    
    segmented_data = np.unique(segmented_data, axis= 0) # 删除重複項
    np.random.shuffle( segmented_data) #打亂資料
    Class_ = np.ones(len(segmented_data))*class_
    
    return segmented_data, Class_
           

對振動信号進行分組以生成資料集,下載下傳資料集後,振動信号将根據其運作條件/屬性*分組在 5 個檔案夾中(對應于資料集中的運作類别數:1 個正常類别和 4 個故障類别),檔案夾名稱如下:

#Healthy
#IR
#OR
#BF
#CompF
#*IR = Inner Race fault
#*OR = Outer Race faults
#*BF = Ball faults
#*CompF = Combination of faults


num_samples = 40000 # 輸入長度
fs = 200000; # 采樣頻率
data_path = (r"D:\dataset") #路徑

cls_1 = 'Healthy/*'
cls_2 = 'IR/*'
cls_3 = 'OR/*'
cls_4 = 'BF/*'
cls_5 = 'CompF/*'
           

資料集建立

norm, y_norm   = make_dataset(os.path.join(data_path, cls_1), num_samples, 0)
defc1, y_defc1 = make_dataset(os.path.join(data_path, cls_2), num_samples, 1)
defc2, y_defc2 = make_dataset(os.path.join(data_path, cls_3), num_samples, 2)
defc3, y_defc3 = make_dataset(os.path.join(data_path, cls_4), num_samples, 3)
defc4, y_defc4 = make_dataset(os.path.join(data_path, cls_5), num_samples, 4)

X = np.concatenate( (norm, defc1, defc2, defc3, defc4) , axis=0, out=None)
Y = np.concatenate( (y_norm, y_defc1, y_defc2, y_defc3, y_defc4), axis=0, out=None)
           

開始特征提取步驟,此處花費時間較長。選擇db4小波,分解層數為7,代碼還是很容易看懂的

wavelet_function = "db4" #db4小波
num_levels = 7 # 分解層數
m = 5  #m個最高幅度及其對應的頻率

#小波包分解和特征提取
num_features = 2**num_levels
features = np.repeat(np.nan, len(X)*m*num_features).reshape(len(X),m*num_features)

for i in range(len(X)):    
    wp = pywt.WaveletPacket(X[i], wavelet = wavelet_function, maxlevel = num_levels) # 小波包分解
    packet_names = [node.path for node in wp.get_level(num_levels, "natural")]
    for j in range(num_features):
        new_wp = pywt.WaveletPacket(data = None, wavelet = wavelet_function, maxlevel = num_levels)
        new_wp[packet_names[j]] = wp[packet_names[j]].data
        reconstructed_signal = new_wp.reconstruct(update = False) #從小波包系數重建信号
        f, c = apply_fft(reconstructed_signal, fs, len(reconstructed_signal))
        z = abs(c)
        
        #求頻譜的m個最高幅度及其對應的頻率
        maximal_idx = np.argpartition(z, -m)[-m:]
        high_amp = z[maximal_idx]
        high_freq = f[maximal_idx]
        feature = high_amp*high_freq
        
        l = 0
        for f in feature:
            features[i,j*m+l] = f
            l = l+1
           

特征提取完畢後,開始基于随機森林的故障分類,導入相關機器學習子產品

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import seaborn as sns
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_curve
from sklearn.metrics import roc_auc_score
from sklearn.metrics import accuracy_score, confusion_matrix
           

下面進行标簽轉換

labels = pd.Categorical(Y)
           

然後,訓練集和測試集劃分

X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size = 0.2, 
                                                    shuffle = True, stratify = labels)#random_state = 42
           

并進行資料标準化

scaler = StandardScaler()
train_data_scaled = scaler.fit_transform(X_train)
test_data_scaled = scaler.transform(X_test)
           

最後開始随機森林訓練

clf_RF = RandomForestClassifier(criterion='entropy', max_features = 1, min_samples_leaf=1, min_samples_split=3, 
                                max_depth=20, n_estimators=200)
clf_RF.fit(train_data_scaled, y_train)
           

看一下混淆矩陣等結果

No. of Samples = 40000 / k = 7 / m = 5

ROC AUC = 1.000

F1 Score = 0.9915254237288136

Accuracy = 99.153 %

基于小波包和随機森林的uOttawa軸承資料集分類

面包多代碼如下

🍞正在為您運送作品詳情