1、运行环境
CDH:CDH 5.16.1
Java:1.8
scala:2.11.8
spark:2.2.0
oozie:oozie-4.1.0
2、创建spark2目录
hadoop fs -mkdir /user/oozie/share/lib/lib_20190121152411/spark2
上传jar包
hadoop fs -put /opt/cloudera/parcels/SPARK2/lib/spark2/jars/* /user/oozie/share/lib/lib_20190121152411/spark2
hadoop fs -cp /user/oozie/share/lib/lib_20190121152411/spark/oozie-sharelib-spark-4.1.0-cdh5.16.1.jar /user/oozie/share/lib/lib_20190121152411/spark2
hadoop fs -cp /user/oozie/share/lib/lib_20190121152411/spark/oozie-sharelib-spark.jar /user/oozie/share/lib/lib_20180403101432/spark2
![](https://img.laitimes.com/img/9ZDMuAjOiMmIsIjOiQnIsICM38FdsYkRGZkRG9lcvx2bjxiNx8VZ6l2cs0TPB1UMVRUT0EkaNBDOsJGcohVYsR2MMBjVtJWd0ckW65UbM5WOHJWa5kHT20ESjBjUIF2X0hXZ0xCMx81dvRWYoNHLrdEZwZ1Rh5WNXp1bwNjW1ZUba9VZwlHdssmch1mclRXY39CXldWYtlWPzNXZj9mcw1ycz9WL49zZuBnL3MTO0AzMygTMxIjMwkTMwIzLc52YucWbp5GZzNmLn9Gbi1yZtl2Lc9CX6MHc0RHaiojIsJye.png)
3、新建job.properties、workflow.xml、目录lib存放需要运行的jar包
其中
job.properties:
nameNode=hdfs://cdh01:8020 jobTracker=cdh01:8032 master=yarn-cluster queueName=default examplesRoot=examples oozie.use.system.libpath=true oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/spark2 workflowpath=${nameNode}/user/${user.name}/${examplesRoot}/apps/spark2 userName=root groupsName=supergroup #jars in hdfs oozie.libpath=/user/oozie/share/lib/lib_20190121152411/spark2 oozie.subworkflow.classpath.inheritance=true #oozie url oozieUrl=http://cdh01:11000/oozie/ |
workflow.xml
<workflow-app xmlns='uri:oozie:workflow:0.5' name='SparkPi'> <start to="SparkPi" /> <action name="SparkPi"> <spark xmlns="uri:oozie:spark-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>oozie.action.sharelib.for.spark</name> <value>spark2 </value> </property> </configuration> <master>${master}</master> <name>SparkPi</name> <class>org.apache.spark.examples.SparkPi</class> <jar>${nameNode}/user/${wf:user()}/${examplesRoot}/apps/spark2/lib/spark-examples_2.11-2.3.0.cloudera3.jar</jar> <spark-opts> --deploy-mode cluster --driver-memory 2G --executor-memory 4G --num-executors 5 --executor-cores 2</spark-opts> </spark> <ok to="end" /> <error to="fail_kill" /> </action> <kill name="fail_kill"> <message>Job failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end" /> </workflow-app> |
其中运行时一直出现Error: E0701 : E0701: XML schema error, cvc-complex-type.2.4.a: Invalid content was found starting with element 'name'. One of '{"uri:oozie:spark-action:0.1":master}' is expected.
这个问题是因为我的master和name前后调换了一下,导致出现这个问题,master在前则正常运行。
4、将该文件下的所有文件全部上传到${nameNode}/user/${wf:user()}/${examplesRoot}/apps/spark2所在的目录
5、启动oozie任务
运行成功
遇到的问题:
日志可能在oozie web ui中如果看不到具体日志,可以进8088端口看Applications的日志,到处找找。
问题:
Caused by: org.apache.spark.SparkException: Exception when registering SparkListener
at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2371)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:554)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2493)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:933)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:924)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:924)
at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:688)
Caused by: java.lang.ClassNotFoundException: com.cloudera.spark.lineage.ClouderaNavigatorListener
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:239)
at org.apache.spark.util.Utils$$anonfun$loadExtensions$1.apply(Utils.scala:2738)
at org.apache.spark.util.Utils$$anonfun$loadExtensions$1.apply(Utils.scala:2736)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:2736)
at org.apache.spark.SparkContext$$anonfun$setupAndStartListenerBus$1.apply(SparkContext.scala:2360)
at org.apache.spark.SparkContext$$anonfun$setupAndStartListenerBus$1.apply(SparkContext.scala:2359)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2359)
解决办法: