天天看點

Spark MLlib 核心基礎:向量 And 矩陣1、Spark MLlib 核心基礎:向量 And矩陣

1、Spark MLlib 核心基礎:向量 And矩陣

1.1 Vector

1.1.1 dense vector

源碼定義:

   * Creates a dense vector from its values.

   */

  @varargs

  def dense(firstValue: Double, otherValues: Double*): Vector =

    new DenseVector((firstValue +: otherValues).toArray)

  // A dummy implicit is used to avoid signature collision with the one generated by @varargs.

  def dense(values: Array[Double]): Vector =new DenseVector(values)

實作方法:

scala>   val A1 = (1 to 5).toArray.map {f => f.toDouble}

A1: Array[Double] = Array(1.0, 2.0, 3.0, 4.0, 5.0)

scala>   val V1 = Vectors.dense(A1)

V1: org.apache.spark.mllib.linalg.Vector = [1.0,2.0,3.0,4.0,5.0]

scala>   val V2 = Vectors.dense(2.0, 2.0, 2.0, 2.0, 2.0, 2.0)

V2: org.apache.spark.mllib.linalg.Vector = [2.0,2.0,2.0,2.0,2.0,2.0]

1.1.2 dense vector

源碼定義:

  def sparse(size: Int, indices: Array[Int], values: Array[Double]): Vector =

    new SparseVector(size, indices, values)

  def sparse(size: Int, elements: Seq[(Int, Double)]): Vector = {

  def sparse(size: Int, elements: JavaIterable[(JavaInteger, JavaDouble)]): Vector = {

實作方法:

scala>   val S1 = Vectors.sparse(5, Array(0, 1, 2, 3, 4), Array(1.0, 2.0, 3.0, 4.0, 5.0))

S1: org.apache.spark.mllib.linalg.Vector = (5,[0,1,2,3,4],[1.0,2.0,3.0,4.0,5.0])

scala>   val S2 = Vectors.sparse(5, Seq((0, 1.0), (1, 2.0), (2,3.0), (3,4.0), (4,5.0)))

S2: org.apache.spark.mllib.linalg.Vector = (5,[0,1,2,3,4],[1.0,2.0,3.0,4.0,5.0])

1.2 Matrix

1.2.1 dense matrix

源碼定義:

  def dense(numRows: Int, numCols: Int, values: Array[Double]): Matrix = {

    new DenseMatrix(numRows, numCols, values)

  }

實作方法:

scala>   val A2 = (1 to 25).toArray.map { f => f.toDouble }

A2: Array[Double] = Array(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0)

scala>   val M1 = Matrices.dense(5, 5, A2)

M1: org.apache.spark.mllib.linalg.Matrix =

1.0  6.0   11.0  16.0  21.0 

2.0  7.0   12.0  17.0  22.0 

3.0  8.0   13.0  18.0  23.0 

4.0  9.0   14.0  19.0  24.0 

5.0  10.0  15.0  20.0  25.0 

1.2.2 sparse matrix

源碼定義:

  def sparse(

     numRows: Int,

     numCols: Int,

     colPtrs: Array[Int],

     rowIndices: Array[Int],

     values: Array[Double]): Matrix = {

    new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values)

  }

分布式分塊矩陣。參照:http://de.wikipedia.org/wiki/Blockmatrix

繼續閱讀