大資料生态環境系統,越來越依賴CDH生态。大部分公司都是用CDH來部署大資料生态架構,這種結構是運維的一大福音,但是對于開發确實一個噩夢一樣,下載下傳CDH版本的Spark,Hadoop依賴包實在試太慢了了,甚至有可能下載下傳不了。
直接下載下傳國外原廠鏡像,很難下載下傳的下來。阿裡雲maven私服不包含CDH版本spark,hadoop依賴包,在周遊了衆多國内鏡像後,發現華為雲包含CDH版spark,hadoop鏡像。但是華為雲鏡像隻包含CDH6.xx相關的鏡像,如果基礎環境還是CDH5.xx智能使用華為鏡像和原廠鏡像交替才實作。
首先我們先看一看原廠鏡像部署方式:
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
<spark.version>2.4.0.cloudera2</spark.version>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.main.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.main.version}</artifactId>
<version>${spark.version}</version>
</dependency>
這種方式簡單直接,前提是要有足夠的耐心,支援依賴包最多。
配置說明見:https://docs.cloudera.com/documentation/spark2/2-4-x/topics/spark2_maven_repo.html
https://docs.cloudera.com/documentation/spark2/2-4-x/topics/cds_24_maven_artifacts.html
阿裡雲對CDH spark 支援為零:https://maven.aliyun.com/mvn/view
華為鏡像,需要修改maven_path/config/setting.xml
<mirror>
<id>alimaven</id>
<name>aliyun maven</name>
<url>http://maven.aliyun.com/nexus/content/groups/public/</url>
<mirrorOf>central</mirrorOf>
</mirror>
<mirror>
<id>huaweicloud</id>
<mirrorOf>*</mirrorOf>
<url>https://mirrors.huaweicloud.com/repository/maven/</url>
</mirror>
詳見:https://mirrors.huaweicloud.com/repository/maven/Org/apache/spark/spark-core_2.11/
混合部署方式如下:
1.編輯maven_path/config/setting.xml
<mirror>
<id>alimaven</id>
<name>aliyun maven</name>
<url>http://maven.aliyun.com/nexus/content/groups/public/</url>
<mirrorOf>central</mirrorOf>
</mirror>
<mirror>
<id>huaweicloud</id>
<mirrorOf>*</mirrorOf>
<url>https://mirrors.huaweicloud.com/repository/maven/</url>
</mirror>
2.編輯pom.xml
<properties>
<scala.main.version>2.11</scala.main.version>
<scala.version>${scala.main.version}.12</scala.version>
<spark.version>2.4.0.cloudera2</spark.version>
<mysql.version>5.1.34</mysql.version>
<neo4j.version>1.7.5</neo4j.version>
<scopo_spark>provided</scopo_spark>
</properties>
<repositories>
<repository>
<id>cloudera_tmp</id>
<name>colud</name>
<url>https://repo.rdc.aliyun.com/repository/82963-release-FnoWLy/</url>
</repository>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
<repository>
<id>scala-tools.org</id>
<name>Scala-Tools Maven2 Repository</name>
<url>http://scala-tools.org/repo-releases</url>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<id>scala-tools.org</id>
<name>Scala-Tools Maven2 Repository</name>
<url>http://scala-tools.org/repo-releases</url>
</pluginRepository>
</pluginRepositories>
<dependencies>
<!-- <dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.4.0.cloudera2</version>
<scope>provided</scope>
</dependency>-->
<dependency>
<groupId>org.scalaj</groupId>
<artifactId>scalaj-http_${scala.main.version}</artifactId>
<version>2.4.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.main.version}</artifactId>
<version>${spark.version}</version>
<scope>${scopo_spark}</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.main.version}</artifactId>
<version>${spark.version}</version>
<scope>${scopo_spark}</scope>
</dependency>
<!-- <dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_${scala.main.version}</artifactId>
<version>${spark.version}</version>
</dependency>-->
<!-- https://mvnrepository.com/artifact/org.scalatest/scalatest -->
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest_${scala.main.version}</artifactId>
<version>3.2.0-SNAP1</version>
<scope>test</scope>
</dependency>
<!--scala Test 生成測試報告-->
<dependency>
<groupId>org.pegdown</groupId>
<artifactId>pegdown</artifactId>
<version>1.4.2</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/mysql/mysql-connector-java -->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>${mysql.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.neo4j.driver/neo4j-java-driver -->
<dependency>
<groupId>org.neo4j.driver</groupId>
<artifactId>neo4j-java-driver</artifactId>
<version>${neo4j.version}</version>
</dependency>
</dependencies>
這樣就可以下載下傳了,CDH5.x相關的jar從cludera官網下載下傳,其餘jar包從華為雲下載下傳