Spark mllib 保序回歸

從該序列的首元素往後觀察，一旦出現亂序現象停止該輪觀察，從該亂序元素開始逐個吸收元素組成一個序列，直到該序列所有元素的平均值小于或等于下一個待吸收的元素。

舉例：

原始序列：<9, 10, 14>

結果序列：<9, 10, 14>

分析：從9往後觀察，到最後的元素14都未發現亂序情況，不用處理。

原始序列：<9, 14, 10>

結果序列：<9, 12, 12>

分析：從9往後觀察，觀察到14時發生亂序（14>10），停止該輪觀察轉入吸收元素處理，吸收元素10後子序列為<14, 10>，取該序列所有元素的平均值得12，故用序列<12, 12>替代<14, 10>。吸收10後已經到了最後的元素，處理操作完成。

原始序列：<14, 9, 10, 15>

結果序列：<11, 11, 11, 15>

分析：從14往後觀察，觀察到9時發生亂序（14>9），停止該輪觀察轉入吸收元素處理，吸收元素9後子序列為<14,9>。求該序列所有元素的平均值得12.5，由于12.5大于下個待吸收的元素10，是以再吸收10，得序列<14, 9, 10>。求該序列所有元素的平均值得11，由于11小于下個待吸收的元素15，是以停止吸收操作，用序列<11, 11, 11>替代<14, 9, 10>。

package com.immooc.spark

import org.apache.log4j.{Level, Logger}

import org.apache.spark.mllib.regression.IsotonicRegression

import org.apache.spark.{SparkConf, SparkContext}

object Isotonic_Regression {

def main(args:Array[String]): Unit = {

val conf = new SparkConf().setAppName("Isotonic_Regression").setMaster("local[2]")

val sc = new SparkContext(conf)

Logger.getRootLogger.setLevel(Level.WARN)

val data = sc.textFile("file:///Users/walle/Documents/D3/sparkmlib/sample_isotonic_regression_data.txt")

val parsedData = data.map{

line=>

val parts = line.split(',').map(_.toDouble)

(parts(0), parts(1), 1.0)

}

val splits = parsedData.randomSplit(Array(0.6, 0.4), seed = 11L)

val training = splits(0)

val test = splits(1)

val model = new IsotonicRegression().setIsotonic(true).run(training)

val x = model.boundaries

val y = model.predictions

println("boundaries" + "\t" + "predictions")

for (i <- 0 to x.length -1){

println(x(i) + "\t" + y(i))

}

val predictionAndLabel = test.map{

point =>

val predictedLabel = model.predict(point._2)

(predictedLabel, point._1)

}

val print_predict = predictionAndLabel.collect

println("prediction" + "\t" + "label")

for (i <- 0 to print_predict.length - 1) {

println(print_predict(i)._1 + "\t" + print_predict(i)._2)

}

val meanSquaredError = predictionAndLabel.map { case (p, l) => math.pow((p - l), 2) }.mean()

println("Mean Squared Error = " + meanSquaredError)

}

3. Result

boundaries predictions

0.01 0.168689444

0.17 0.168689444

0.18 0.19545421571428573

0.27 0.19545421571428573

0.28 0.20040796

0.3 0.43396226

0.31 0.5275369700000001

0.32 0.5275369700000001

0.35 0.54156043

0.36 0.5602243760000001

0.41 0.5602243760000001

0.44 0.5936596972222222

0.76 0.5936596972222222

0.77 0.64762876

0.79 0.6807751133333334

0.82 0.6807751133333334

0.83 0.73890872

0.84 0.73992861

0.86 0.89673636

0.87 0.89673636

0.9 0.93115757

0.95 0.93472718

1.0 0.93472718

prediction label

0.168689444 0.31208567

0.168689444 0.35900051

0.168689444 0.03926568

0.168689444 0.12952575

0.168689444 0.0

0.168689444 0.01376849

0.168689444 0.13105558

0.19545421571428573 0.13717491

0.19545421571428573 0.19020908

0.19545421571428573 0.19581846

0.31718510999999966 0.29576747

0.5322114566666667 0.4854666

0.5368859433333334 0.49209587

0.5602243760000001 0.5017848

0.5713694830740741 0.58286588

0.5825145901481482 0.64660887

0.5936596972222222 0.65782764

0.5936596972222222 0.63029067

0.5936596972222222 0.65323814

0.5936596972222222 0.67006629

0.5936596972222222 0.51555329

0.5936596972222222 0.33299337

0.5936596972222222 0.36206017

0.5936596972222222 0.4309026

0.5936596972222222 0.48393677

0.5936596972222222 0.48495665

0.5936596972222222 0.4518103

0.5936596972222222 0.47118817

0.5936596972222222 0.58031617

0.5936596972222222 0.55481897

0.5936596972222222 0.56603774

0.6642019366666667 0.66241713

0.6807751133333334 0.65119837

0.818332485 0.84242733

0.9082100966666666 0.90719021

0.9196838333333334 0.93115757

0.931871492 0.91942886

0.932585414 0.9291178

0.933299336 0.95665477

0.934013258 0.9500255

0.93472718 0.89801122

0.93472718 0.90311066

0.93472718 0.9036206

Mean Squared Error = 0.008860256490591361

Spark mllib 保序回歸

3. Result

繼續閱讀

搭建httpd服務

windows下配置Apache的vhost初次接觸，強烈歡迎拍磚，指出錯誤

Apache與PHP環境下配置本地虛拟主機

Linux 7 中配置Apache服務，及禁止ip通路，删除apache廣告頁面。

Apache配置檔案中的deny和allow的使用

Apache 配置預設編碼

伺服器配置——Apache

Apache靜态檔案通路配置（書封伺服器）

apache httpd 配置

大資料排錯SparkSpark叢集啟動時候，JAVA_HOME is not sethadoop叢集，某台伺服器jps無任何輸出IDEAkafkahadoopspark sqlfile permissionsIDEA本地測試 - OutOfMemoryError: GC overhead limit exceededhdfs負載均衡

Ubuntu16.04安裝Apache+MySQL+PHP1. 安裝Apache2. 安裝MySQL3. 安裝PHP4. 安裝phpMyAdmin

Apache配置SSLApache配置SSL

Windows下配置Apache的SSL服務

Apache2.4.x 配置檔案詳解Apache配置需要了解如下：開始講解：

配置apache支援PHP（win7）

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結