1、Spark MLlib 核心基礎:向量 And矩陣
1.1 Vector
1.1.1 dense vector
源碼定義:
* Creates a dense vector from its values.
*/
@varargs
def dense(firstValue: Double, otherValues: Double*): Vector =
new DenseVector((firstValue +: otherValues).toArray)
// A dummy implicit is used to avoid signature collision with the one generated by @varargs.
def dense(values: Array[Double]): Vector =new DenseVector(values)
實作方法:
scala> val A1 = (1 to 5).toArray.map {f => f.toDouble}
A1: Array[Double] = Array(1.0, 2.0, 3.0, 4.0, 5.0)
scala> val V1 = Vectors.dense(A1)
V1: org.apache.spark.mllib.linalg.Vector = [1.0,2.0,3.0,4.0,5.0]
scala> val V2 = Vectors.dense(2.0, 2.0, 2.0, 2.0, 2.0, 2.0)
V2: org.apache.spark.mllib.linalg.Vector = [2.0,2.0,2.0,2.0,2.0,2.0]
1.1.2 dense vector
源碼定義:
def sparse(size: Int, indices: Array[Int], values: Array[Double]): Vector =
new SparseVector(size, indices, values)
def sparse(size: Int, elements: Seq[(Int, Double)]): Vector = {
def sparse(size: Int, elements: JavaIterable[(JavaInteger, JavaDouble)]): Vector = {
實作方法:
scala> val S1 = Vectors.sparse(5, Array(0, 1, 2, 3, 4), Array(1.0, 2.0, 3.0, 4.0, 5.0))
S1: org.apache.spark.mllib.linalg.Vector = (5,[0,1,2,3,4],[1.0,2.0,3.0,4.0,5.0])
scala> val S2 = Vectors.sparse(5, Seq((0, 1.0), (1, 2.0), (2,3.0), (3,4.0), (4,5.0)))
S2: org.apache.spark.mllib.linalg.Vector = (5,[0,1,2,3,4],[1.0,2.0,3.0,4.0,5.0])
1.2 Matrix
1.2.1 dense matrix
源碼定義:
def dense(numRows: Int, numCols: Int, values: Array[Double]): Matrix = {
new DenseMatrix(numRows, numCols, values)
}
實作方法:
scala> val A2 = (1 to 25).toArray.map { f => f.toDouble }
A2: Array[Double] = Array(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0)
scala> val M1 = Matrices.dense(5, 5, A2)
M1: org.apache.spark.mllib.linalg.Matrix =
1.0 6.0 11.0 16.0 21.0
2.0 7.0 12.0 17.0 22.0
3.0 8.0 13.0 18.0 23.0
4.0 9.0 14.0 19.0 24.0
5.0 10.0 15.0 20.0 25.0
1.2.2 sparse matrix
源碼定義:
def sparse(
numRows: Int,
numCols: Int,
colPtrs: Array[Int],
rowIndices: Array[Int],
values: Array[Double]): Matrix = {
new SparseMatrix(numRows, numCols, colPtrs, rowIndices, values)
}
分布式分塊矩陣。參照:http://de.wikipedia.org/wiki/Blockmatrix