第一個 Spark 程式:WordCount
1. 使用 Spark-shell
- 準備資料:建立檔案夾
,以及input
檔案Words.txt
在檔案中輸入資料:[[email protected] spark-2.1.1]$ mkdir input [[email protected] input]$ vim Words.txt
hello spark hello scala hello world
- 進入
spark-shell
[[email protected] spark-2.1.1]$ bin/spark-shell
![](https://img.laitimes.com/img/9ZDMuAjOiMmIsIjOiQnIsICM38FdsYkRGZkRG9lcvx2bjxiNx8VZ6l2cs0TPB9keJRVT0MGROBDOsJGcohVYsR2MMBjVtJWd0ckW65UbM5WOHJWa5kHT20ESjBjUIF2X0hXZ0xCMx81dvRWYoNHLrdEZwZ1Rh5WNXp1bwNjW1ZUba9VZwlHdssmch1mclRXY39CXldWYtlWPzNXZj9mcw1ycz9WL49zZuBnL0cDN4UzN0YTM1ITNwAjMwIzLc52YucWbp5GZzNmLn9Gbi1yZtl2Lc9CX6MHc0RHaiojIsJye.png)
- 編寫
程式并運作WordCount
scala> sc.textFile("input/").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).collect res0: Array[(String, Int)] = Array((scala,1), (hello,3), (world,1), (spark,1))
01 第一個 Spark 程式:WordCount第一個 Spark 程式:WordCount
2. 使用開發工具 IDEA
- 建立
項目,并導入如下依賴Maven
<dependencies> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version>2.1.1</version> </dependency> </dependencies> <build> <plugins> <!-- 打包插件, 否則 scala 類不會編譯并打包進去 --> <plugin> <groupId>net.alchim31.maven</groupId> <artifactId>scala-maven-plugin</artifactId> <version>3.4.6</version> <executions> <execution> <goals> <goal>compile</goal> <goal>testCompile</goal> </goals> </execution> </executions> </plugin> </plugins> </build>
- 建立
檔案,實作以下代碼WordCount.scala
package com.guli import org.apache.spark.{SparkConf, SparkContext} object WordCount { def main(args: Array[String]): Unit = { val conf: SparkConf = new SparkConf().setAppName("WorldCount").setMaster("local[*]") val sc = new SparkContext(conf) val wcArray: Array[(String, Int)] = sc.textFile("/Users/zgl/Desktop/input").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).collect() wcArray.foreach(println) sc.stop() } }
- 運作結果
(scala,1) (hello,3) (world,1) (spark,1)