天天看點

python sklearn svm svr 多輸出_RDKit | 基于支援向量回歸(SVR)預測logP

RDKit一個用于化學資訊學的python庫。使用支援向量回歸(SVR)來預測logP。分子的輸入結構特征是摩根指紋,輸出是logP。

代碼示例:

  1. 導入依賴庫

  2. import numpy as npfrom rdkit import Chemfrom rdkit.Chem.Crippen import MolLogPfrom rdkit import Chem, DataStructsfrom rdkit.Chem import AllChemfrom sklearn.svm import SVRfrom sklearn.metrics import mean_squared_error, r2_scorefrom scipy import statsimport matplotlib.pyplot as plt
               

載入smile分子庫,計算morgan指紋和logP

num_mols = 5000f = open('smiles.txt', 'r')contents = f.readlines()fps_total = []logP_total = []for i in range(num_mols):smi = contents[i].split()[0]m = Chem.MolFromSmiles(smi)fp = AllChem.GetMorganFingerprintAsBitVect(m,2)arr = np.zeros((1,))DataStructs.ConvertToNumpyArray(fp,arr)fps_total.append(arr)logP_total.append(MolLogP(m))fps_total = np.asarray(fps_total)logP_total = np.asarray(logP_total)
           

劃分訓練集和測試集

num_total = fps_total.shape[0]num_train = int(num_total*0.8)num_total, num_train, (num_total-num_train)fps_train = fps_total[0:num_train]logP_train = logP_total[0:num_train]fps_test = fps_total[num_train:]logP_test = logP_total[num_train:]
           

将SVR模型用于回歸模型

https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html

  1. _gamma = 5.0clf = SVR(kernel='poly', gamma=_gamma)clf.fit(fps_train, logP_train)
               

完成訓練後,應該檢查預測的準确性。對于評估,将使用r2和名額的均方誤差。

  1. logP_pred = clf.predict(fps_test)r2 = r2_score(logP_test, logP_pred)mse = mean_squared_error(logP_test, logP_pred)r2, mse
               

模型結果可視化

  1. slope, intercept, r_value, p_value, std_error = stats.linregress(logP_test, logP_pred)yy = slope*logP_test+interceptplt.scatter(logP_test, logP_pred, color='black', s=1)plt.plot(logP_test, yy, label='Predicted logP = '+str(round(slope,2))+'*True logP + '+str(round(intercept,2)))plt.xlabel('True logP')plt.ylabel('Predicted logP')plt.legend()plt.show()
               
python sklearn svm svr 多輸出_RDKit | 基于支援向量回歸(SVR)預測logP

參考

https://github.com/SeongokRyu/CH485---Artificial-Intelligence-and-Chemistry

https://blog.csdn.net/zb123455445/article/details/78354489