[Spark应用]-- 实现pageRank

2022-11-03 14:29:38

package com.scala

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
/**
 * scala实现pageRank算法
 * 
 * Computes the PageRank of URLs from an input file. Input file should
 * be in format of:
 * URL         neighbor URL
 * URL         neighbor URL
 * URL         neighbor URL
 * ...
 * where URL and their neighbors are separated by space(s).
 */
object PageRank {
  def main(args:Array[String]):Unit={
      //    if (args.length < 1) {
    //      System.err.println("Usage: SparkPageRank <file> <iter>")
    //      System.exit(1)
    //    }
    val sparkConf = new SparkConf().setAppName("PageRank").setMaster("local[1]")
    val iters = 20;
    //    val iters = if (args.length > 0) args(1).toInt else 10
    val ctx = new SparkContext(sparkConf)
    val lines = ctx.textFile("page.txt")

    //根据边关系数据生成 邻接表 如：(1,(2,3,4,5)) (2,(1,5))..
    val links = lines.map{ s =>
      val parts = s.split("\\s+")
      (parts(0), parts(1))
    }.distinct().groupByKey().cache()

    links.foreach(println)

    // (1,1.0) (2,1.0)..
    var ranks = links.mapValues(v => 1.0)
    ranks.foreach(println)
    for (i <- 1 to iters) {
      // (1,((2,3,4,5), 1.0))
      val contribs = links.join(ranks).values.flatMap{ case (urls, rank) =>
        val size = urls.size
        urls.map(url => (url, rank / size))
      }
      ranks = contribs.reduceByKey(_ + _).mapValues(0.15 + 0.85 * _)
    }

    val output = ranks.collect()
    output.foreach(tup => println(tup._1 + " has rank: " + tup._2 + "."))

    ctx.stop()
  }
}

[Spark应用]-- 实现pageRank

继续阅读

Linux 7 中配置Apache服务，及禁止ip访问，删除apache广告页面。

9.spark Core 进阶2--Cashe

Apache配置文件中的deny和allow的使用

Apache 配置默认编码

服务器配置——Apache

Apache静态文件访问配置（书封服务器）

apache httpd 配置

大数据排错SparkSpark集群启动时候，JAVA_HOME is not sethadoop集群，某台服务器jps无任何输出IDEAkafkahadoopspark sqlfile permissionsIDEA本地测试 - OutOfMemoryError: GC overhead limit exceededhdfs负载均衡

Ubuntu16.04安装Apache+MySQL+PHP1. 安装Apache2. 安装MySQL3. 安装PHP4. 安装phpMyAdmin

Apache配置SSLApache配置SSL

Windows下配置Apache的SSL服务

Apache2.4.x 配置文件详解Apache配置需要了解如下：开始讲解：

配置apache支持PHP（win7）

spark/scala关于【资源文件】加载方法概述外部文件加载方案测试资源文件打包入jar包中小结

win10本地scala和spark安装安装scala安装spark

scala (3) Function 和 Method