天天看点

oozie任务调度spark2实例

1、运行环境

CDH:CDH 5.16.1

Java:1.8

scala:2.11.8

spark:2.2.0

oozie:oozie-4.1.0

2、创建spark2目录

hadoop fs -mkdir /user/oozie/share/lib/lib_20190121152411/spark2

上传jar包

hadoop fs -put /opt/cloudera/parcels/SPARK2/lib/spark2/jars/*  /user/oozie/share/lib/lib_20190121152411/spark2

hadoop fs -cp /user/oozie/share/lib/lib_20190121152411/spark/oozie-sharelib-spark-4.1.0-cdh5.16.1.jar /user/oozie/share/lib/lib_20190121152411/spark2

hadoop fs -cp /user/oozie/share/lib/lib_20190121152411/spark/oozie-sharelib-spark.jar /user/oozie/share/lib/lib_20180403101432/spark2 

oozie任务调度spark2实例

3、新建job.properties、workflow.xml、目录lib存放需要运行的jar包

其中

job.properties:

nameNode=hdfs://cdh01:8020

jobTracker=cdh01:8032

master=yarn-cluster

queueName=default

examplesRoot=examples

oozie.use.system.libpath=true

oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/spark2

workflowpath=${nameNode}/user/${user.name}/${examplesRoot}/apps/spark2

userName=root

groupsName=supergroup

#jars in hdfs

oozie.libpath=/user/oozie/share/lib/lib_20190121152411/spark2

oozie.subworkflow.classpath.inheritance=true

#oozie url

oozieUrl=http://cdh01:11000/oozie/

workflow.xml

<workflow-app xmlns='uri:oozie:workflow:0.5' name='SparkPi'>

    <start to="SparkPi" />

    <action name="SparkPi">

        <spark xmlns="uri:oozie:spark-action:0.1">

            <job-tracker>${jobTracker}</job-tracker>

            <name-node>${nameNode}</name-node>

            <configuration>

                <property>

                    <name>oozie.action.sharelib.for.spark</name>

                    <value>spark2 </value>

                </property>

            </configuration>

            <master>${master}</master>

            <name>SparkPi</name>

            <class>org.apache.spark.examples.SparkPi</class>

            <jar>${nameNode}/user/${wf:user()}/${examplesRoot}/apps/spark2/lib/spark-examples_2.11-2.3.0.cloudera3.jar</jar>

            <spark-opts> --deploy-mode cluster --driver-memory 2G --executor-memory 4G --num-executors 5 --executor-cores 2</spark-opts>

        </spark>

        <ok to="end" />

        <error to="fail_kill" />

    </action>

    <kill name="fail_kill">

        <message>Job failed, error

            message[${wf:errorMessage(wf:lastErrorNode())}]</message>

    </kill>

    <end name="end" />

</workflow-app>

其中运行时一直出现Error: E0701 : E0701: XML schema error, cvc-complex-type.2.4.a: Invalid content was found starting with element 'name'. One of '{"uri:oozie:spark-action:0.1":master}' is expected.

oozie任务调度spark2实例

这个问题是因为我的master和name前后调换了一下,导致出现这个问题,master在前则正常运行。

4、将该文件下的所有文件全部上传到${nameNode}/user/${wf:user()}/${examplesRoot}/apps/spark2所在的目录

oozie任务调度spark2实例
oozie任务调度spark2实例

5、启动oozie任务

oozie任务调度spark2实例

运行成功

oozie任务调度spark2实例
oozie任务调度spark2实例
oozie任务调度spark2实例

遇到的问题:

日志可能在oozie web ui中如果看不到具体日志,可以进8088端口看Applications的日志,到处找找。

问题:

Caused by: org.apache.spark.SparkException: Exception when registering SparkListener
	at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2371)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:554)
	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2493)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:933)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:924)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:924)
	at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
	at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:688)
Caused by: java.lang.ClassNotFoundException: com.cloudera.spark.lineage.ClouderaNavigatorListener
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.spark.util.Utils$.classForName(Utils.scala:239)
	at org.apache.spark.util.Utils$$anonfun$loadExtensions$1.apply(Utils.scala:2738)
	at org.apache.spark.util.Utils$$anonfun$loadExtensions$1.apply(Utils.scala:2736)
	at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
	at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
	at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
	at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
	at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
	at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:2736)
	at org.apache.spark.SparkContext$$anonfun$setupAndStartListenerBus$1.apply(SparkContext.scala:2360)
	at org.apache.spark.SparkContext$$anonfun$setupAndStartListenerBus$1.apply(SparkContext.scala:2359)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2359)      

解决办法:

oozie任务调度spark2实例