機器學習實戰之--regression

前面主要講到了分類問題，從這節開始，進入到回歸的學習。這節主要介紹幾個常用的數值回歸算法。

1、線性回歸

資料的線性拟合

平方誤差損失函數：

機器學習實戰之--regression

回歸系數：

機器學習實戰之--regression

主要算法實作：

def standRegres(xArr,yArr):
    xMat = mat(xArr); yMat = mat(yArr).T
    xTx = xMat.T*xMat
    if linalg.det(xTx) == :
        print "This matrix is singular, cannot do inverse"
        return
    ws = xTx.I * (xMat.T*yMat)
    return ws

2、局部權重線性回歸

由于線性回歸可能的欠拟合，引入局部權重線性回歸，根據距離訓練樣本和預測樣本之間的距離不同，而給定不同的權值。

為了表示上面的權值，引入核，常用的核為高斯核：

機器學習實戰之--regression

k取不同值時，與權重w的關系

機器學習實戰之--regression

回歸系數：

主要算法實作：

def lwlr(testPoint,xArr,yArr,k=):
    xMat = mat(xArr); yMat = mat(yArr).T
    m = shape(xMat)[]
    weights = mat(eye((m)))
    for j in range(m):                      #next 2 lines create weights matrix
        diffMat = testPoint - xMat[j,:]     #
        weights[j,j] = exp(diffMat*diffMat.T/(-*k**))
    xTx = xMat.T * (weights * xMat)
    if linalg.det(xTx) == :
        print "This matrix is singular, cannot do inverse"
        return
    ws = xTx.I * (xMat.T * (weights * yMat))
    return testPoint * ws

def lwlrTest(testArr,xArr,yArr,k=):  #loops over all the data points and applies lwlr to each one
    m = shape(testArr)[]
    yHat = zeros(m)
    for i in range(m):
        yHat[i] = lwlr(testArr[i],xArr,yArr,k)
    return yHat

def lwlrTestPlot(xArr,yArr,k=):  #same thing as lwlrTest except it sorts X first
    yHat = zeros(shape(yArr))       #easier for plotting
    xCopy = mat(xArr)
    xCopy.sort()
    for i in range(shape(xArr)[]):
        yHat[i] = lwlr(xCopy[i],xArr,yArr,k)
    return yHat,xCopy

3、嶺回歸和逐漸線性回歸

如果特征數>樣本個數（m>n）怎麼辦？（此時非滿秩矩陣，矩陣不能求逆），一開始為了解決這個問題而引入了縮減系數的方法，嶺回歸就是其中的一種。簡單來說嶺回歸就是在矩陣X’*T後加入一個lamda*I，使之成為一個滿秩矩陣。嶺回歸也用于在估計中加入偏差，以便能得到更好的估計。這裡通過引入lamda來限制所有的w之和，通過引入該懲罰項，能夠減少不重要的參數，這一技術在統計學上稱為縮減技術。

回歸系數：![這裡寫圖檔描述](https://img-blog.csdn.net

主要代碼實作：（資料要做标準化處理，思考一下上面時候要用到資料标準化？損失函數，權重和）

def rssError(yArr,yHatArr): #yArr and yHatArr both need to be arrays
    return ((yArr-yHatArr)**).sum()

def ridgeRegres(xMat,yMat,lam=):
    xTx = xMat.T*xMat
    denom = xTx + eye(shape(xMat)[])*lam
    if linalg.det(denom) == :
        print "This matrix is singular, cannot do inverse"
        return
    ws = denom.I * (xMat.T*yMat)
    return ws

def ridgeTest(xArr,yArr):
    xMat = mat(xArr); yMat=mat(yArr).T
    yMean = mean(yMat,)
    yMat = yMat - yMean     #to eliminate X0 take mean off of Y
    #regularize X's
    xMeans = mean(xMat,)   #calc mean then subtract it off
    xVar = var(xMat,)      #calc variance of Xi then divide by it
    xMat = (xMat - xMeans)/xVar
    numTestPts = 
    wMat = zeros((numTestPts,shape(xMat)[]))
    for i in range(numTestPts):
        ws = ridgeRegres(xMat,yMat,exp(i-))
        wMat[i,:]=ws.T
    return wMat

def regularize(xMat):#regularize by columns
    inMat = xMat.copy()
    inMeans = mean(inMat,)   #calc mean then subtract it off
    inVar = var(inMat,)      #calc variance of Xi then divide by it
    inMat = (inMat - inMeans)/inVar
    return inMat

向前逐漸回歸：

算法僞代碼

機器學習實戰之--regression

def stageWise(xArr,yArr,eps=,numIt=):
    xMat = mat(xArr); yMat=mat(yArr).T
    yMean = mean(yMat,)
    yMat = yMat - yMean     #can also regularize ys but will get smaller coef
    xMat = regularize(xMat)
    m,n=shape(xMat)
    #returnMat = zeros((numIt,n)) #testing code remove
    ws = zeros((n,)); wsTest = ws.copy(); wsMax = ws.copy()
    for i in range(numIt):
        print ws.T
        lowestError = inf; 
        for j in range(n):
            for sign in [-,]:
                wsTest = ws.copy()
                wsTest[j] += eps*sign
                yTest = xMat*wsTest
                rssE = rssError(yMat.A,yTest.A)
                if rssE < lowestError:
                    lowestError = rssE
                    wsMax = wsTest
        ws = wsMax.copy()
        #returnMat[i,:]=ws.T
    #return returnMat

4、權衡方差和偏差

能挖掘出哪些特征是重要的，哪些特征是不重要的

算法實作：

def crossValidation(xArr,yArr,numVal=):
    m = len(yArr)                           
    indexList = range(m)
    errorMat = zeros((numVal,))#create error mat 30columns numVal rows
    for i in range(numVal):
        trainX=[]; trainY=[]
        testX = []; testY = []
        random.shuffle(indexList)
        for j in range(m):#create training set based on first 90% of values in indexList
            if j < m*: 
                trainX.append(xArr[indexList[j]])
                trainY.append(yArr[indexList[j]])
            else:
                testX.append(xArr[indexList[j]])
                testY.append(yArr[indexList[j]])
        wMat = ridgeTest(trainX,trainY)    #get 30 weight vectors from ridge
        for k in range():#loop over all of the ridge estimates
            matTestX = mat(testX); matTrainX=mat(trainX)
            meanTrain = mean(matTrainX,)
            varTrain = var(matTrainX,)
            matTestX = (matTestX-meanTrain)/varTrain #regularize test with training params
            yEst = matTestX * mat(wMat[k,:]).T + mean(trainY)#test ridge results and store
            errorMat[i,k]=rssError(yEst.T.A,array(testY))
            #print errorMat[i,k]
    meanErrors = mean(errorMat,)#calc avg performance of the different ridge weight vectors
    minMean = float(min(meanErrors))
    bestWeights = wMat[nonzero(meanErrors==minMean)]
    #can unregularize to get model
    #when we regularized we wrote Xreg = (x-meanX)/var(x)
    #we can now write in terms of x not Xreg:  x*w/var(x) - meanX/var(x) +meanY
    xMat = mat(xArr); yMat=mat(yArr).T
    meanX = mean(xMat,); varX = var(xMat,)
    unReg = bestWeights/varX
    print "the best model from Ridge Regression is:\n",unReg
    print "with constant term: ",-*sum(multiply(meanX,unReg)) + mean(yMat)

機器學習實戰之--regression

繼續閱讀

Command Network(POJ 3164)---定根最小樹形圖模闆題題目描述輸入格式輸出格式輸入樣例輸出樣例分析源程式

開源低帶寬語音編解碼器

241 Different Ways to Add Parentheses（C代碼版）

【趨高機器視覺】機器視覺技術原了解析及解決方案

吳恩達 coursera ML 第七課總結+作業答案前言目錄正文模型表示作業答案

CSMA/CD1． CSMA/CD的概述2． CSMA 的工作原理3． CSMA/CD控制規程及特點4． CSMA/CD協定5． CSMA/CD的優點6．結束語

XGBoost Plotting API以及GBDT組合特征實踐 XGBoost Plotting API以及GBDT組合特征實踐

極大似然法(ML)與最大期望法(EM)

解碼器用于語義分割：資料依賴的解碼可以實作靈活的特征聚合

2021-2025年中國運動療法（KT）帶行業市場供需與戰略研究報告

C++ 第十五周報告1--《冒泡法排序》

筆試面試題目：滑動視窗(二)

資料結構與算法（27）——排序（二）

Dijkstra--簡易版（最短路徑）

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

hdu7108哈希