從該序列的首元素往後觀察,一旦出現亂序現象停止該輪觀察,從該亂序元素開始逐個吸收元素組成一個序列,直到該序列所有元素的平均值小于或等于下一個待吸收的元素。
舉例:
原始序列:<9, 10, 14>
結果序列:<9, 10, 14>
分析:從9往後觀察,到最後的元素14都未發現亂序情況,不用處理。
原始序列:<9, 14, 10>
結果序列:<9, 12, 12>
分析:從9往後觀察,觀察到14時發生亂序(14>10),停止該輪觀察轉入吸收元素處理,吸收元素10後子序列為<14, 10>,取該序列所有元素的平均值得12,故用序列<12, 12>替代<14, 10>。吸收10後已經到了最後的元素,處理操作完成。
原始序列:<14, 9, 10, 15>
結果序列:<11, 11, 11, 15>
分析:從14往後觀察,觀察到9時發生亂序(14>9),停止該輪觀察轉入吸收元素處理,吸收元素9後子序列為<14,9>。求該序列所有元素的平均值得12.5,由于12.5大于下個待吸收的元素10,是以再吸收10,得序列<14, 9, 10>。求該序列所有元素的平均值得11,由于11小于下個待吸收的元素15,是以停止吸收操作,用序列<11, 11, 11>替代<14, 9, 10>。
package com.immooc.spark
import org.apache.log4j.{Level, Logger}
import org.apache.spark.mllib.regression.IsotonicRegression
import org.apache.spark.{SparkConf, SparkContext}
object Isotonic_Regression {
def main(args:Array[String]): Unit = {
val conf = new SparkConf().setAppName("Isotonic_Regression").setMaster("local[2]")
val sc = new SparkContext(conf)
Logger.getRootLogger.setLevel(Level.WARN)
val data = sc.textFile("file:///Users/walle/Documents/D3/sparkmlib/sample_isotonic_regression_data.txt")
val parsedData = data.map{
line=>
val parts = line.split(',').map(_.toDouble)
(parts(0), parts(1), 1.0)
}
val splits = parsedData.randomSplit(Array(0.6, 0.4), seed = 11L)
val training = splits(0)
val test = splits(1)
val model = new IsotonicRegression().setIsotonic(true).run(training)
val x = model.boundaries
val y = model.predictions
println("boundaries" + "\t" + "predictions")
for (i <- 0 to x.length -1){
println(x(i) + "\t" + y(i))
}
val predictionAndLabel = test.map{
point =>
val predictedLabel = model.predict(point._2)
(predictedLabel, point._1)
}
val print_predict = predictionAndLabel.collect
println("prediction" + "\t" + "label")
for (i <- 0 to print_predict.length - 1) {
println(print_predict(i)._1 + "\t" + print_predict(i)._2)
}
val meanSquaredError = predictionAndLabel.map { case (p, l) => math.pow((p - l), 2) }.mean()
println("Mean Squared Error = " + meanSquaredError)
}
}
3. Result
boundaries predictions
0.01 0.168689444
0.17 0.168689444
0.18 0.19545421571428573
0.27 0.19545421571428573
0.28 0.20040796
0.3 0.43396226
0.31 0.5275369700000001
0.32 0.5275369700000001
0.35 0.54156043
0.36 0.5602243760000001
0.41 0.5602243760000001
0.44 0.5936596972222222
0.76 0.5936596972222222
0.77 0.64762876
0.79 0.6807751133333334
0.82 0.6807751133333334
0.83 0.73890872
0.84 0.73992861
0.86 0.89673636
0.87 0.89673636
0.9 0.93115757
0.95 0.93472718
1.0 0.93472718
prediction label
0.168689444 0.31208567
0.168689444 0.35900051
0.168689444 0.03926568
0.168689444 0.12952575
0.168689444 0.0
0.168689444 0.01376849
0.168689444 0.13105558
0.19545421571428573 0.13717491
0.19545421571428573 0.19020908
0.19545421571428573 0.19581846
0.31718510999999966 0.29576747
0.5322114566666667 0.4854666
0.5368859433333334 0.49209587
0.5602243760000001 0.5017848
0.5713694830740741 0.58286588
0.5825145901481482 0.64660887
0.5936596972222222 0.65782764
0.5936596972222222 0.63029067
0.5936596972222222 0.65323814
0.5936596972222222 0.67006629
0.5936596972222222 0.51555329
0.5936596972222222 0.33299337
0.5936596972222222 0.36206017
0.5936596972222222 0.4309026
0.5936596972222222 0.48393677
0.5936596972222222 0.48495665
0.5936596972222222 0.4518103
0.5936596972222222 0.47118817
0.5936596972222222 0.58031617
0.5936596972222222 0.55481897
0.5936596972222222 0.56603774
0.6642019366666667 0.66241713
0.6807751133333334 0.65119837
0.818332485 0.84242733
0.9082100966666666 0.90719021
0.9196838333333334 0.93115757
0.931871492 0.91942886
0.932585414 0.9291178
0.933299336 0.95665477
0.934013258 0.9500255
0.93472718 0.89801122
0.93472718 0.90311066
0.93472718 0.9036206
Mean Squared Error = 0.008860256490591361